CN110610709A

CN110610709A - Identity distinguishing method based on voiceprint recognition

Info

Publication number: CN110610709A
Application number: CN201910916553.4A
Authority: CN
Inventors: 王磊
Original assignee: Zhejiang Baiying Technology Co Ltd
Current assignee: Zhejiang Baiying Technology Co Ltd
Priority date: 2019-09-26
Filing date: 2019-09-26
Publication date: 2019-12-24

Abstract

The invention relates to the field of voice recognition, in particular to an identity identification method based on voiceprint recognition, which comprises the following steps: training a voiceprint model through a deep learning algorithm based on a corpus; inputting the sample voice of the user into the trained voiceprint model to obtain a sample voiceprint feature vector, and registering the sample voiceprint feature vector into a voiceprint database; collecting outbound voice when a user answers, and carrying out fragment processing on the outbound voice in real time through a vad algorithm to obtain a plurality of sections of voice; respectively inputting the outbound voice after the fragmentation processing into the trained voiceprint models to obtain corresponding test voiceprint feature vectors; similarity calculation is carried out through a cosine similarity function based on the test voiceprint feature vector and the sample voiceprint feature vector of the user in the voiceprint database; and judging whether the outbound voice is sent by the same user or not according to the calculated similarity. The invention realizes that whether the owner answers by a person is judged in the outbound process.

Description

Identity distinguishing method based on voiceprint recognition

Technical Field

The invention relates to the field of voice recognition, in particular to an identity identification method based on voiceprint recognition.

Background

Voiceprint recognition is an artificial intelligence technique for recognizing the identity of a speaker according to hundreds of characteristic dimensions such as wavelength, frequency and intensity in voice. Because the voiceprint recognition has safety and reliability, the voiceprint recognition has a great number of application scenes, such as a plurality of fields of public security, finance, social security, intelligent hardware and the like.

With the development of artificial intelligence, the requirements of people on the voiceprint recognition system are gradually improved, and the voiceprint recognition system has the advantages of higher recognition speed, lower use cost and more accurate recognition. However, due to different application scenarios, many problems are often encountered, and the use of a general voiceprint recognition system can result in unsatisfactory recognition results.

Aiming at the scene of intelligent outbound of the robot, the robot cannot identify whether the owner changes other people to answer in the conversation process.

Disclosure of Invention

In order to solve the problems, the invention provides an identity identification method based on voiceprint recognition to judge whether a user changes others to answer in the conversation process.

The identity distinguishing method based on voiceprint recognition comprises the following steps:

training a voiceprint model through a deep learning algorithm based on a corpus;

inputting the sample voice of the user into the trained voiceprint model to obtain a sample voiceprint feature vector, and registering the sample voiceprint feature vector into a voiceprint database;

collecting outbound voice when a user answers, and carrying out fragment processing on the outbound voice in real time through a vad algorithm to obtain a plurality of sections of voice;

respectively inputting the outbound voice after the fragmentation processing into the trained voiceprint models to obtain corresponding test voiceprint feature vectors;

similarity calculation is carried out through a cosine similarity function based on the test voiceprint feature vector and the sample voiceprint feature vector of the user in the voiceprint database;

and judging whether the outbound voice is sent by the same user or not according to the calculated similarity.

Preferably, the training of the voiceprint model based on the corpus comprises:

selecting a sentence from a specific speaker, and marking the sentence as an anchor sample;

another sentence selected from the same speaker is marked as a positive sample;

a sentence selected from different speakers is marked as a negative sample;

when the anchor sample and the positive sample are trained, the result is as close to 1 as possible;

when the anchor sample and the negative sample are trained, the result is as close to 0 as possible.

The training of the voiceprint model through the deep learning algorithm based on the corpus further comprises:

putting the divided samples into an input layer in a neural network for training;

in order to avoid that the model is trapped in a local optimum point too early, a softmax function is added into an output layer, and the result is normalized;

inputting the data into a cross entropy loss function to obtain a loss value of the model;

and continuously iterating the parameters through back propagation to minimize the loss of the model and finally obtain the voiceprint model.

Preferably, the step of inputting the sample voice of the user into the trained voiceprint model to obtain a sample voiceprint feature vector, and the step of registering the sample voiceprint feature vector into the voiceprint database includes:

acquiring sample voice of a user, acquiring a sample voiceprint feature vector through the trained voiceprint model, and acquiring a corresponding feature vector i-vector or d-vector;

averaging the feature vectors i-vector or d-vector to obtain a sample voiceprint feature vector of the user;

and registering the user id and the corresponding sample voiceprint feature vector to a voiceprint database.

Preferably, the calculating the similarity by the cosine similarity function based on the test voiceprint feature vector and the sample voiceprint feature vector of the user in the voiceprint database includes:

calculating the similarity between the tested voiceprint characteristic vector and the sample voiceprint characteristic vector of the user in the voiceprint database by using a cosine similarity function to obtain an evaluation score;

obtaining an evaluation score vector corresponding to each section of voice according to the evaluation scores;

and normalizing the scores through a softmax layer to convert the scores into similar probabilities.

Preferably, the determining whether the outbound voice is sent by the same user according to the calculated similarity includes:

when the similarity probability is more than or equal to a set threshold value, judging that the outbound voice is sent by the same user;

and when the similar probability is smaller than a set threshold value, judging that the outbound voice is sent by different users.

The invention has the following beneficial effects:

1. inputting the sample voice of the user into the trained voiceprint model to obtain a sample voiceprint feature vector, and registering the sample voiceprint feature vector into a voiceprint database; collecting outbound voice when a user answers, and carrying out fragment processing on the outbound voice in real time through a vad algorithm to obtain a plurality of sections of voice; respectively inputting the outbound voice after the fragmentation processing into the trained voiceprint models to obtain corresponding test voiceprint feature vectors; similarity calculation is carried out through a cosine similarity function based on the test voiceprint feature vector and the sample voiceprint feature vector of the user in the voiceprint database; judging whether the outbound voice is sent by the same user according to the similarity obtained by calculation, thereby judging whether the owner answers by another person in the outbound process;

2. the voiceprint model is trained through a deep learning algorithm, interference of other factors such as channels and equipment is not required to be considered, and therefore high identification accuracy is guaranteed.

Drawings

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

FIG. 1 is a flow chart of a voiceprint recognition based identity recognition method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating step S2 of a voiceprint recognition-based identity recognition method according to an embodiment of the present invention;

fig. 3 is a flowchart of step S5 in an identity recognition method based on voiceprint recognition according to an embodiment of the present invention.

Detailed Description

The technical solutions of the present invention will be further described below with reference to the accompanying drawings, but the present invention is not limited to these embodiments.

The basic idea of the invention is to train a voiceprint model through a deep learning algorithm based on a corpus; inputting the sample voice of the user into the trained voiceprint model to obtain a sample voiceprint feature vector, and registering the sample voiceprint feature vector into a voiceprint database; collecting outbound voice when a user answers, and carrying out fragment processing on the outbound voice in real time through a vad algorithm to obtain a plurality of sections of voice; respectively inputting the outbound voice after the fragmentation processing into the trained voiceprint models to obtain corresponding test voiceprint feature vectors; similarity calculation is carried out through a cosine similarity function based on the test voiceprint feature vector and the sample voiceprint feature vector of the user in the voiceprint database; and judging whether the outbound voice is sent by the same user according to the similarity obtained by calculation, thereby judging whether the owner answers by another person in the outbound process.

Based on the above conception, the embodiment of the present invention provides an identity identification method based on voiceprint recognition, as shown in fig. 1, including the following steps:

s1: training a voiceprint model through a deep learning algorithm based on a corpus;

s2: inputting the sample voice of the user into the trained voiceprint model to obtain a sample voiceprint feature vector, and registering the sample voiceprint feature vector into a voiceprint database;

s3: collecting outbound voice when a user answers, and carrying out fragment processing on the outbound voice in real time through a vad algorithm to obtain a plurality of sections of voice;

s4: respectively inputting the outbound voice after the fragmentation processing into the trained voiceprint models to obtain corresponding test voiceprint feature vectors;

s5: similarity calculation is carried out through a cosine similarity function based on the test voiceprint feature vector and the sample voiceprint feature vector of the user in the voiceprint database;

s6: and judging whether the outbound voice is sent by the same user or not according to the calculated similarity.

The concept of deep learning stems from the study of artificial neural networks. A multi-layer perceptron with multiple hidden layers is a deep learning structure. Deep learning forms a more abstract class or feature of high-level representation properties by combining low-level features to discover a distributed feature representation of the data. Deep learning is a method based on characterization learning of data in machine learning. The observations can be represented in a number of ways, such as a vector of intensity values for each pixel, or more abstractly as a series of edges, a specially shaped region, and so forth. Tasks (e.g., face recognition or speech recognition) are more easily learned from the examples using some specific representation methods.

In the embodiment, the voiceprint model is trained through the deep learning algorithm, interference of other factors such as channels and equipment is not required to be considered, and therefore high identification accuracy is guaranteed.

In this embodiment, the method for training the voiceprint model by the deep learning algorithm based on the corpus includes:

selecting a sentence from a specific speaker, and marking the sentence as an anchor sample; another sentence selected from the same speaker is marked as a positive sample; a sentence selected from different speakers is marked as a negative sample; when the anchor sample and the positive sample are trained, the result is as close to 1 as possible; when the anchor sample and the negative sample are trained, the result is as close to 0 as possible. When training the anchor sample and the positive sample, we want to make the result as close as possible, i.e. the result is close to 1. When training the anchor sample and the negative sample, we want to make the result as far as possible, i.e. the result is close to 0.

Training data are prepared, and in order to avoid trapping in a local optimal point early in the process of over-training, a Softmax + cross entropy pre-training method is used, so that a voiceprint model is obtained finally. Specifically, firstly, the divided samples are put into an input layer in a neural network for training, in order to avoid that the model is trapped in a local optimum point too early, a softmax function is added into the output layer, the result is normalized and input into a cross entropy loss function to obtain a loss value of the model, and the loss of the model is minimized through back propagation of continuous iteration parameters, so that an optimal pre-training model, namely a voiceprint model, is finally obtained.

In this embodiment, as shown in fig. 2, a method for inputting a sample voice of a user into a trained voiceprint model to obtain a sample voiceprint feature vector and registering the sample voiceprint feature vector in a voiceprint database includes:

s21: acquiring sample voice of a user, acquiring a sample voiceprint feature vector through the trained voiceprint model, and acquiring a corresponding feature vector i-vector or d-vector;

s22: averaging the feature vectors i-vector or d-vector to obtain a sample voiceprint feature vector of the user;

s23: and registering the user id and the corresponding sample voiceprint feature vector to a voiceprint database.

And the user id corresponds to the sample voiceprint characteristic vector one by one, and the user id and the sample voiceprint characteristic vector are registered in a voiceprint database. When the identity of another user needs to be distinguished, whether the same user id exists or not is searched in a voiceprint database according to the user id. When the same user id exists in the voiceprint database, the voiceprint characteristic vector of the user is registered in the voiceprint database; and when the same user id does not exist in the voiceprint database, the voiceprint feature vector of the user is not registered in the voiceprint database. If the user has registered the sample voiceprint feature vector in the voiceprint database, extraction of the voiceprint feature vector is not required.

In order to improve the accuracy of identity recognition, after the outbound voice when the user answers is collected, the outbound voice is subjected to fragment processing in real time through a vad algorithm to obtain a plurality of sections of voice. In the embodiment, by distinguishing each small segment of voice, the identity can be distinguished more effectively than by distinguishing a large segment of voice.

In this embodiment, the real-time identification of the user identity is realized by real-time acquisition and real-time fragmentation of the outbound voice.

In this embodiment, as shown in fig. 3, the method for calculating the similarity through the cosine similarity function based on the test voiceprint feature vector and the sample voiceprint feature vector of the user in the voiceprint database includes:

s51: calculating the similarity between the tested voiceprint characteristic vector and the sample voiceprint characteristic vector of the user in the voiceprint database by using a cosine similarity function to obtain an evaluation score;

s52: obtaining an evaluation score vector corresponding to each section of voice according to the evaluation scores;

s53: and normalizing the scores through a softmax layer to convert the scores into similar probabilities.

In this embodiment, the method for determining whether the outbound voice is uttered by the same user according to the calculated similarity includes:

And each section of voice can obtain the similar probability through the processing of the steps, whether the section of voice is sent by the same user is judged through the comparison with the set threshold value, and the mark is marked on the section of voice according to the judgment result, so that the subsequent tracking processing is facilitated.

Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. The identity identification method based on voiceprint recognition is characterized by comprising the following steps:

2. The identity recognition method based on voiceprint recognition according to claim 1, wherein the training of the voiceprint model through the deep learning algorithm based on the corpus comprises:

another sentence selected from the same speaker is marked as a positive sample;

a sentence selected from different speakers is marked as a negative sample;

3. The identity recognition method based on voiceprint recognition according to claim 2, wherein the training of the voiceprint model through the deep learning algorithm based on the corpus further comprises:

4. The identity recognition method based on voiceprint recognition according to claim 1, wherein the step of inputting the sample voice of the user into the trained voiceprint model to obtain a sample voiceprint feature vector, and the step of registering the sample voiceprint feature vector into the voiceprint database comprises the steps of:

5. The identity recognition method based on voiceprint recognition according to claim 1, wherein the similarity calculation by a cosine similarity function based on the test voiceprint feature vector and the sample voiceprint feature vector of the user in the voiceprint database comprises:

6. The identity recognition method based on voiceprint recognition according to claim 5, wherein the judging whether the outbound voice is sent by the same user according to the calculated similarity comprises: