CN108447489B

CN108447489B - Continuous voiceprint authentication method and system with feedback

Info

Publication number: CN108447489B
Application number: CN201810343274.9A
Authority: CN
Inventors: 王东; 李蓝天
Original assignee: Beijing Furui Xingchen Intelligent Technology Co ltd; Tsinghua University
Current assignee: Beijing Furui Xingchen Intelligent Technology Co ltd; Tsinghua University
Priority date: 2018-04-17
Filing date: 2018-04-17
Publication date: 2020-05-22
Anticipated expiration: 2038-04-17
Also published as: CN108447489A

Abstract

The invention provides a continuous voiceprint authentication method and a continuous voiceprint authentication system with feedback, which are used for acquiring a voice frame to be authenticated in real time and extracting a voiceprint characteristic vector to be authenticated corresponding to the voice frame to be authenticated; calculating the distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector, and determining the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector according to the distance calculation result; and determining a matching score corresponding to the voice frame to be authenticated according to the similarity, and feeding back the matching score to a user corresponding to the voice frame to be authenticated in real time so that the user can adjust pronunciation according to the matching score. The method and the system can feed back the matching score of each voice frame in the voice sent by the user in real time in the voiceprint authentication process, thereby effectively guiding the user to adjust the pronunciation mode of the user according to the matching score, effectively avoiding the false rejection of the legal user, improving the successful authentication probability of the legal user and being beneficial to improving the authentication experience of the user.

Description

Continuous voiceprint authentication method and system with feedback

Technical Field

The invention relates to the technical field of voice recognition, in particular to a continuous voiceprint authentication method and system with feedback.

Background

Biometric identity authentication is important in modern society for identity verification or identification through a user's unique biometric feature. Voiceprint authentication technology is an important member of various biometric authentication. Compared with other authentication modes, the voiceprint authentication has the characteristics of convenience, convenience and the like, is particularly suitable for remote authentication, and has important significance in the mobile payment era. Voiceprint authentication is a method of determining the identity of a speaker by voice.

Current voiceprint authentication techniques can all be referred to as "feedback-free fragment authentication". In the authentication mode, a user gives a section of pronunciation according to the requirement, and the system has no feedback in the process of pronunciation; after the pronunciation is finished, the system carries out overall evaluation on the sound fragment and gives a result. One disadvantage of this feedback-free fragmented authentication approach is that the authentication process is not transparent to the user, i.e. the user does not get any feedback during this process. For an illegal user, the method can reduce the information exposure of the system, thereby reducing the risk of break-in by the illegal user; but this approach would greatly increase the likelihood of false rejects for legitimate users. This is because the user lacks a feedback mechanism during the authentication process, and cannot improve the authentication score by adjusting the pronunciation manner to cooperate with the system.

In view of the above, it is desirable to provide a voiceprint authentication method and system, which can provide feedback in real time during authentication, so as to guide a user to adjust pronunciation, reduce false rejection to legitimate users, and improve user experience.

Disclosure of Invention

The invention provides a continuous voiceprint authentication method and system with feedback, aiming at solving the problem that the voiceprint authentication technology in the prior art is lack of a feedback mechanism in the authentication process, so that the authentication of a legal user fails.

In one aspect, the present invention provides a continuous voiceprint authentication method with feedback, including:

s1, acquiring a voice frame to be authenticated in real time, and extracting a voiceprint characteristic vector to be authenticated corresponding to the voice frame to be authenticated;

s2, calculating the distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector, and determining the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector according to the distance calculation result;

s3, determining the matching score corresponding to the voice frame to be authenticated according to the similarity, and feeding the matching score back to the user corresponding to the voice frame to be authenticated in real time, so that the user can adjust pronunciation according to the matching score.

Preferably, the step S3 further includes:

and when the matching score reaches a first preset score, determining that the user authentication is successful.

Preferably, the step S3 is followed by:

when the matching score does not reach a first preset score, repeating the steps S1-S3 to obtain the matching score;

within a first preset time, when the matching score reaches the first preset score, determining that the user authentication is successful;

and when the first preset time is reached, if the matching score does not reach a second preset score, determining that the user authentication fails.

Preferably, the step S3 is followed by:

when the first preset time is reached, if the matching score does not reach the first preset score but is higher than the second preset score, prolonging the authentication time of the user to a second preset time;

and in the second preset time, if the matching score does not reach the first preset score, determining that the user authentication fails.

Preferably, the step S1 further includes:

acquiring a voice frame to be authenticated in real time, and acquiring a frequency spectrum corresponding to the voice frame to be authenticated;

and extracting the voiceprint characteristic vector to be authenticated corresponding to the voice frame to be authenticated according to the frequency spectrum by using a preset characteristic extraction model.

Preferably, the step S2 is preceded by:

acquiring a voice segment of a registered user, and extracting a voiceprint characteristic vector corresponding to each voice frame in the voice segment;

and carrying out weighted average operation on the voiceprint characteristic vectors corresponding to all the voice frames in the voice segment to obtain the registration voiceprint characteristic vector.

Preferably, the step S2 of calculating the distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector further includes:

calculating the distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector by using a distance function;

the distance function includes a cosine distance function and a euclidean distance function.

In one aspect, the present invention provides a continuous voiceprint authentication system with feedback, including:

the characteristic extraction module is used for acquiring a voice frame to be authenticated in real time and extracting a voiceprint characteristic vector to be authenticated corresponding to the voice frame to be authenticated;

the similarity calculation module is used for calculating the distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector and determining the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector according to the distance calculation result;

and the real-time feedback module is used for determining a matching score corresponding to the voice frame to be authenticated according to the similarity and feeding the matching score back to a user corresponding to the voice frame to be authenticated in real time so that the user can adjust pronunciation according to the matching score.

In one aspect, the present invention provides a device of a continuous voiceprint authentication method with feedback, including:

at least one processor; and

at least one memory communicatively coupled to the processor, wherein:

the memory stores program instructions executable by the processor, the processor being capable of performing any of the methods described above when invoked by the processor.

In one aspect, the present invention provides a non-transitory computer readable storage medium storing computer instructions that cause a computer to perform any of the methods described above.

The invention provides a continuous voiceprint authentication method and system with feedback, which are characterized in that a voice frame to be authenticated is obtained in real time, and a voiceprint characteristic vector to be authenticated corresponding to the voice frame to be authenticated is extracted; calculating the distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector, and determining the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector according to the distance calculation result; and determining a matching score corresponding to the voice frame to be authenticated according to the similarity, and feeding back the matching score to a user corresponding to the voice frame to be authenticated in real time so that the user can adjust pronunciation according to the matching score. The method and the system can feed back the matching score of each voice frame in the voice sent by the user in real time in the voiceprint authentication process, thereby effectively guiding the user to adjust the pronunciation mode of the user according to the matching score, effectively avoiding the false rejection of the legal user, improving the successful authentication probability of the legal user and being beneficial to improving the authentication experience of the user.

Drawings

Fig. 1 is a schematic overall flow chart of a continuous voiceprint authentication method with feedback according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a voiceprint feature vector extraction method to be authenticated according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for extracting a registered voiceprint feature vector according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an overall structure of a continuous voiceprint authentication system with feedback according to an embodiment of the present invention;

fig. 5 is a schematic structural framework diagram of an apparatus of a continuous voiceprint authentication method with feedback according to an embodiment of the present invention.

Detailed Description

The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

Fig. 1 is a schematic overall flow chart of a continuous voiceprint authentication method with feedback according to an embodiment of the present invention, and as shown in fig. 1, the present invention provides a continuous voiceprint authentication method with feedback, including:

it should be noted that the continuous voiceprint authentication method with feedback provided by the present invention can be applied to various terminal devices, taking a user terminal as an example, when performing identity authentication on a user by utilizing voiceprint authentication, the user continuously sends out a segment of voice, wherein the voice content can be predefined content, for example, content displayed on a screen of the user terminal; the content may also be any content read by the user, and may be set according to actual requirements, which is not specifically limited herein.

And in the process of continuously sending a section of voice by the user, acquiring a voice frame sent by the user at each moment in real time, wherein the voice frame is a voice frame to be authenticated. And meanwhile, aiming at the obtained voice frame to be authenticated, extracting a voiceprint characteristic vector corresponding to the voice frame to be authenticated, wherein the voiceprint characteristic vector is the voiceprint characteristic vector to be authenticated. Specifically, a preset feature extraction model may be used to extract a voiceprint feature vector to be authenticated corresponding to the voice frame to be authenticated, and taking a neural network model as an example, a voiceprint feature vector to be authenticated corresponding to the voice frame to be authenticated may be extracted through a convolutional neural network. The convolutional neural network may include a plurality of convolutional layers, and the number and size of convolutional cores of each convolutional layer may be adjusted according to actual requirements, which is not specifically limited herein. In addition, a pooling layer may be connected behind each convolution layer, where the pooling layer may be a maximum pooling layer or an average pooling layer, windows of the pooling layers may or may not overlap, and the size of the window of the pooling layer may be adjusted according to actual needs, which is not specifically limited herein. When extracting the voiceprint feature vector to be authenticated corresponding to the voice frame to be authenticated by using the convolutional neural network, firstly converting the voice frame to be authenticated into the frequency spectrum feature, then using each convolution core to convolute the frequency spectrum feature, and correspondingly generating a feature plane, wherein the feature plane corresponding to the last pooling layer is the voiceprint feature vector to be authenticated.

specifically, for the obtained voiceprint feature vector to be authenticated, a distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector is calculated, where the distance may be a cosine distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector, or an euclidean distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector, and may be set according to an actual requirement, where no specific limitation is made here. The registered voiceprint feature vector is extracted from the voice fragment of the registered user in advance and is used for representing the voiceprint feature of the registered user, and the registered user is also a legal user. The number of registered users can be one, and correspondingly, the number of registered voiceprint feature vectors is also one; in addition, the number of registered users may also be multiple, and each registered user corresponds to one registered voiceprint feature vector, and correspondingly, the number of registered voiceprint feature vectors is also multiple. When a plurality of registered voiceprint feature vectors exist, the distance between the voiceprint feature vector to be authenticated and each registered voiceprint feature vector is calculated respectively.

Furthermore, the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector is determined according to the distance calculation result, and by taking the cosine distance as an example, when the cosine distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector is larger, the higher the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector can be determined; when the cosine distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector is smaller, the lower the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector can be determined.

And S3, determining the matching score corresponding to the voice frame to be authenticated according to the similarity, and feeding the matching score back to the user corresponding to the voice frame to be authenticated in real time, so that the user can adjust pronunciation according to the matching score.

Specifically, after the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector is obtained through calculation, the matching score corresponding to the voice frame to be authenticated is determined according to the similarity. The higher the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector is, the higher the matching score corresponding to the voice frame to be authenticated is; the lower the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector, the lower the matching score corresponding to the voice frame to be authenticated.

Meanwhile, the matching score corresponding to the voice frame to be authenticated is fed back to the user corresponding to the voice frame to be authenticated in real time. That is, in the user authentication process, when the user sends out each voice frame, the matching score corresponding to each voice frame can be fed back to the user in real time. Taking the user terminal as an example, the matching score corresponding to each voice frame sent by the user can be displayed on the user terminal in real time. Thus, the user can adjust the pronunciation according to the matching score. For example, for a certain registered user, in the authentication process, if the matching score corresponding to each voice frame is generally low in a short period of time due to the fact that the pronunciation of the user is not adjusted, the user can know that a problem may exist in the pronunciation mode according to the fed-back matching score, and then the user can immediately adjust the pronunciation mode of the user to effectively improve the matching score corresponding to each voice frame in the subsequent pronunciation until the authentication is successful.

The invention provides a continuous voiceprint authentication method with feedback, which is characterized in that a voice frame to be authenticated is obtained in real time, and a voiceprint characteristic vector to be authenticated corresponding to the voice frame to be authenticated is extracted; calculating the distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector, and determining the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector according to the distance calculation result; and determining a matching score corresponding to the voice frame to be authenticated according to the similarity, and feeding back the matching score to a user corresponding to the voice frame to be authenticated in real time so that the user can adjust pronunciation according to the matching score. The method can feed back the matching score of each voice frame in the voice sent by the user in real time in the voiceprint authentication process, thereby effectively guiding the user to adjust the pronunciation mode of the user according to the matching score, effectively avoiding the false rejection of the legal user, improving the successful authentication probability of the legal user and being beneficial to improving the authentication experience of the user.

Based on any of the above embodiments, there is provided a continuous voiceprint authentication method with feedback, where step S3 further includes:

Specifically, in the process of voiceprint authentication of a user, for each voice frame to be authenticated in the voice sent by the user, the matching score corresponding to each voice frame to be authenticated is fed back to the user in real time. When the matching score corresponding to a certain voice frame to be authenticated reaches a first preset score, the matching between the voiceprint feature vector to be authenticated corresponding to the voice frame to be authenticated and the registered voiceprint feature vector is determined to be successful, and then the user authentication is determined to be successful. If the first predetermined score is a predetermined threshold value with a high score, for example, the first predetermined score is 99, it may be determined that the user is successfully authenticated when a matching score corresponding to a certain voice frame sent by the user is greater than or equal to 99.

In addition, when the user authentication is successful, the user can be fed back in time. Taking the user terminal as an example, when the user authentication is successful, the text or the pattern of the successful authentication can be displayed on the user terminal. The feedback mode of successful authentication can be set according to actual requirements, and is not specifically limited here.

According to the continuous voiceprint authentication method with the feedback, when the matching score reaches the first preset score, the user authentication is determined to be successful, the user identity can be accurately authenticated, the probability of successful authentication of a legal user is improved, and the authentication experience of the user is favorably improved.

Based on any of the above embodiments, there is provided a continuous voiceprint authentication method with feedback, where after step S3, the method further includes:

when the matching score does not reach the first preset score, repeating the steps S1 to S3 to obtain the matching score;

specifically, in the process of voiceprint authentication performed by the user, a voice frame to be authenticated is obtained in real time, a matching score corresponding to the voice frame to be authenticated is calculated, and when the matching score does not reach a first preset score, steps S1 to S3 in any one of the above method embodiments are repeatedly executed to obtain a matching score corresponding to each subsequent voice frame. Wherein the first predetermined score is a predetermined high score threshold, for example, if the first predetermined score is 99, the matching score corresponding to the current voice frame to be authenticated is lower than 99, and the steps S1 to S3 in any of the above method embodiments are repeatedly executed.

For example, when the matching score corresponding to the speech frame to be authenticated obtained at the time t does not reach the first preset score, the speech frame to be authenticated corresponding to the time t +1 is obtained again, and the matching score corresponding to the speech frame to be authenticated is obtained by calculation according to steps S1 to S3 in any one of the above method embodiments.

In a first preset time, when the matching score reaches a first preset score, determining that the user authentication is successful;

specifically, in the process of repeatedly performing steps S1 to S3 in any of the above method embodiments, if the matching score corresponding to the to-be-authenticated voice frame obtained at a certain time within the first preset time reaches the first preset score, it is determined that the user authentication is successful. The first preset time is preset valid authentication time, and may be set to 10 seconds, 15 seconds, and the like, and may be set according to actual requirements, which is not specifically limited herein.

And when the first preset time is reached, if the matching score does not reach the second preset score, determining that the user authentication fails.

Specifically, the steps S1 to S3 in any of the above method embodiments are repeatedly executed until the first preset time is reached, and if the matching score corresponding to the voice frame to be authenticated, which is obtained at the last moment, does not reach the second preset score, it is determined that the user authentication has failed. The second preset score is a preset threshold value of a low score, and may be set according to actual requirements, which is not specifically limited herein. For example, if the first preset time is 10 seconds, and the second preset score is 20 minutes, when performing voiceprint authentication on a certain user, if the time for repeatedly performing steps S1 to S3 in any of the above method embodiments reaches 10 seconds, and if the matching score corresponding to the voice frame to be authenticated, which is acquired in the 10 th second, is lower than 20 minutes, it is determined that the user authentication has failed.

The invention provides a continuous voiceprint authentication method with feedback, when the matching score corresponding to a voice frame to be authenticated, which is acquired in real time at a certain moment, does not reach a first preset score, the matching score corresponding to the voice frame to be authenticated, which is acquired in real time at the next moment, is determined in the same way, and when the matching score corresponding to the voice frame acquired at a certain moment reaches the first preset score within the first preset time, the user authentication is determined to be successful; and when the first preset time is reached, if the matching score corresponding to the voice frame acquired at the last moment does not reach the second preset score, determining that the user authentication fails. The method can effectively authenticate the identity of the user within the preset effective authentication time, and if and only if the matching score corresponding to the voice frame of the user within the preset effective time meets the high score condition or the low score condition, the success or failure of the authentication of the user is correspondingly determined, so that the accuracy of the identity authentication of the user is improved, and the user authentication experience is favorably improved.

when the first preset time is reached, if the matching score does not reach the first preset score but is higher than the second preset score, the authentication time of the user is prolonged to the second preset time;

specifically, when the time for repeatedly executing steps S1 to S3 in any of the above method embodiments reaches a first preset time, if the matching score corresponding to the voice frame to be authenticated, which is obtained at the last moment, does not reach the first preset score but is higher than a second preset score, that is, is between a high score threshold and a low score threshold, the authentication time of the user is extended from the first preset time to the second preset time. The second preset time may be set according to actual requirements, and is not specifically limited herein.

For example, assume that the first preset time is 10 seconds, the second preset time is 15 seconds, the first preset score is 99 minutes, and the second preset score is 20 minutes. When the time for repeatedly executing steps S1 to S3 in any of the above method embodiments reaches 10 seconds, if the matching score corresponding to the voice frame to be authenticated acquired in the 10 th second is lower than 99 minutes but higher than 20 minutes, the effective authentication time of the user is extended from 10 seconds to 15 seconds.

And in a second preset time, if the matching score does not reach the first preset score, determining that the user authentication fails.

Specifically, after the authentication time of the user is prolonged from the first preset time to the second preset time, the steps S1 to S3 in any of the above method embodiments are repeatedly executed within the second preset time, that is, the voice frame to be authenticated is obtained in real time, the matching score corresponding to the voice frame to be authenticated is calculated, and if none of the matching scores corresponding to the voice frame to be authenticated obtained within the second preset time reaches the first preset score, it is determined that the user authentication has failed. On the basis of the above example, that is, when the time for repeatedly performing steps S1 to S3 in any of the above method embodiments reaches 15 seconds, if none of the matching scores corresponding to the to-be-authenticated speech frames obtained in real time within 15 seconds (including the 15 th second) reaches 99 minutes, it may be determined that the user authentication has failed.

When the continuous voiceprint authentication method with feedback provided by the invention reaches the first preset time, if the matching score does not reach the first preset score but is higher than the second preset score, the authentication time of the user is prolonged to the second preset time; and in a second preset time, if the matching score does not reach the first preset score, determining that the user authentication fails. The method avoids the defect that all users in the traditional authentication system need to perform authentication within the same time to cause authentication errors, and further improves the accuracy of the authentication result on the basis of ensuring the convenient degree of authentication, thereby further improving the user experience.

Based on any of the above embodiments, there is provided a continuous voiceprint authentication method with feedback, as shown in fig. 2, step S1 further includes:

s11, acquiring a voice frame to be authenticated in real time, and acquiring a frequency spectrum corresponding to the voice frame to be authenticated;

specifically, when the user is subjected to voiceprint authentication, the user continuously sends a section of voice, and a voice frame sent by the user at each moment is obtained in real time, wherein the voice frame is a voice frame to be authenticated. Meanwhile, aiming at the obtained speech frames to be authenticated, the frequency spectrums corresponding to the speech frames to be authenticated are obtained, each speech frame corresponds to one frequency spectrum, and the frequency spectrum characteristics corresponding to the speech sent by different users are different, so that the different users can be distinguished and identified through the frequency spectrums.

And S12, extracting the voiceprint characteristic vector of the voice frame to be authenticated corresponding to the voice frame to be authenticated according to the frequency spectrum by using a preset characteristic extraction model.

Specifically, a preset feature extraction model is used for extracting a voiceprint feature vector corresponding to the voice frame to be authenticated according to the obtained frequency spectrum, wherein the voiceprint feature vector is the voiceprint feature vector to be authenticated. The preset feature extraction model is pre-constructed and trained, can be a preset constructed and trained neural network model and the like, can be set according to actual requirements, and is not specifically limited here.

Taking the neural network model as an example, the voiceprint feature vector to be authenticated corresponding to the voice frame to be authenticated can be extracted through the convolutional neural network. The convolutional neural network may include a plurality of convolutional layers, and the number and size of convolutional cores of each convolutional layer may be adjusted according to actual requirements, which is not specifically limited herein. In addition, a pooling layer may be connected behind each convolution layer, where the pooling layer may be a maximum pooling layer or an average pooling layer, windows of the pooling layers may or may not overlap, and the size of the window of the pooling layer may be adjusted according to actual needs, which is not specifically limited herein. When extracting the voiceprint feature vector to be authenticated corresponding to the voice frame to be authenticated by using the convolutional neural network, inputting the frequency spectrum corresponding to the voice frame to be authenticated into the convolutional neural network, then using each convolution core to convolve the frequency spectrum, and correspondingly generating a feature plane, wherein the feature plane corresponding to the last pooling layer is the voiceprint feature vector to be authenticated.

The invention provides a continuous voiceprint authentication method with feedback, which is characterized in that a voice frame to be authenticated is obtained in real time, and a frequency spectrum corresponding to the voice frame to be authenticated is obtained; extracting a voiceprint characteristic vector to be authenticated corresponding to the voice frame to be authenticated according to the frequency spectrum by using a preset characteristic extraction model; the voiceprint characteristic vector corresponding to the voice frame to be authenticated can be accurately extracted, the user identity can be authenticated according to the extracted voiceprint characteristic vector, and therefore the accuracy of user identity authentication can be guaranteed.

Based on any of the above embodiments, there is provided a continuous voiceprint authentication method with feedback, as shown in fig. 3, before step S2, the method further includes:

s21, acquiring the voice segments of the registered user, and extracting the voiceprint characteristic vector corresponding to each voice frame in the voice segments;

specifically, the registered voiceprint feature vector should be extracted in advance before calculating the distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector. First, a voice segment of each registered user is obtained for each registered user, that is, a valid user, where the length of the voice segment may be set according to actual requirements, and is not specifically limited herein. For the voice segment of each registered user, the voice segment is divided into a plurality of voice frames, a voiceprint feature vector corresponding to each voice frame is extracted, and the method steps in any one of the above method embodiments can be referred to by the way of extracting the voiceprint feature vector through the voice frame, which is not described herein again.

S22, carrying out weighted average operation on the voiceprint characteristic vectors corresponding to all the voice frames in the voice fragment to obtain the registered voiceprint characteristic vectors.

Specifically, for each voice segment of each registered user, weighted average operation is performed on voiceprint feature vectors corresponding to all frames in the voice segment, and the obtained voiceprint feature vectors after the operation are the registered voiceprint feature vectors. For example, if a speech segment of a registered user includes 100 speech frames, the voiceprint feature vector corresponding to each speech frame is a₁、A₂、A₃……A₁₀₀Then the registered voiceprint feature obtained after the weighted average operation is (A)₁+A₂+A₃+……+A₁₀₀)/100。

The invention provides a continuous voiceprint authentication method with feedback, which comprises the steps of obtaining a voice segment of a registered user before calculating the distance between a voiceprint feature vector to be authenticated and a registered voiceprint feature vector, and extracting a voiceprint feature vector corresponding to each voice frame in the voice segment; carrying out weighted average operation on the voiceprint characteristic vectors corresponding to all the voice frames in the voice fragment to obtain registered voiceprint characteristic vectors; the voiceprint feature vector of the registered user can be effectively and accurately extracted, and the voiceprint feature vector of the user to be authenticated is favorably matched with the voiceprint feature vector of the registered user, so that the identity authentication of the user to be authenticated is effectively realized.

Based on any of the above embodiments, providing a continuous voiceprint authentication method with feedback, where the step of calculating the distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector in step S2 further includes:

calculating the distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector by using a distance function; the distance function includes a cosine distance function and a euclidean distance function.

Specifically, the distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector is calculated by using a distance function, wherein the distance function comprises a cosine distance function and an euclidean distance function, and can be set according to actual requirements, which is not specifically limited herein. The Euclidean distance function is taken as an example for explanation: if the voice print feature vector to be authenticated is feature (x)₁,x₂,x₃,...x_n) Registering the voiceprint feature vector as feature (y)₁,y₂,y₃,...y_n) Then, the specific calculation formula for calculating the distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector by using the euclidean distance function is as follows:

the continuous voiceprint authentication method with feedback provided by the invention utilizes the distance function to calculate the distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector, wherein the distance function comprises a cosine distance function and an Euclidean distance function, which is favorable for determining the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector, and is further favorable for authenticating the identity of the user.

Fig. 4 is a schematic diagram of an overall structure of a continuous voiceprint authentication system with feedback according to an embodiment of the present invention, and as shown in fig. 4, the present invention provides a continuous voiceprint authentication system with feedback, including:

the characteristic extraction module 1 is used for acquiring a voice frame to be authenticated in real time and extracting a voiceprint characteristic vector to be authenticated corresponding to the voice frame to be authenticated;

in the process that a user continuously sends a section of voice, the feature extraction module 1 is used for acquiring a voice frame sent by the user at each moment in real time, wherein the voice frame is a voice frame to be authenticated. Meanwhile, aiming at the obtained voice frame to be authenticated, a voiceprint feature vector corresponding to the voice frame to be authenticated is extracted by using the feature extraction module 1, and the voiceprint feature vector is the voiceprint feature vector to be authenticated. Specifically, a preset feature extraction model may be used to extract a voiceprint feature vector to be authenticated corresponding to the voice frame to be authenticated, and taking a neural network model as an example, a voiceprint feature vector to be authenticated corresponding to the voice frame to be authenticated may be extracted through a convolutional neural network. The convolutional neural network may include a plurality of convolutional layers, and the number and size of convolutional cores of each convolutional layer may be adjusted according to actual requirements, which is not specifically limited herein. In addition, a pooling layer may be connected behind each convolution layer, where the pooling layer may be a maximum pooling layer or an average pooling layer, windows of the pooling layers may or may not overlap, and the size of the window of the pooling layer may be adjusted according to actual needs, which is not specifically limited herein. When extracting the voiceprint feature vector to be authenticated corresponding to the voice frame to be authenticated by using the convolutional neural network, firstly converting the voice frame to be authenticated into the frequency spectrum feature, then using each convolution core to convolute the frequency spectrum feature, and correspondingly generating a feature plane, wherein the feature plane corresponding to the last pooling layer is the voiceprint feature vector to be authenticated.

The similarity calculation module 2 is used for calculating the distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector and determining the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector according to the distance calculation result;

specifically, for the obtained voiceprint feature vector to be authenticated, the similarity calculation module 2 is used to calculate a distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector, where the distance may be a cosine distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector, or an euclidean distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector, and the distance may be set according to an actual requirement, and is not specifically limited herein. The registered voiceprint feature vector is extracted from the voice fragment of the registered user in advance and is used for representing the voiceprint feature of the registered user, and the registered user is also a legal user. The number of registered users can be one, and correspondingly, the number of registered voiceprint feature vectors is also one; in addition, the number of registered users may also be multiple, and each registered user corresponds to one registered voiceprint feature vector, and correspondingly, the number of registered voiceprint feature vectors is also multiple. When a plurality of registered voiceprint feature vectors exist, the distance between the voiceprint feature vector to be authenticated and each registered voiceprint feature vector is calculated respectively.

Further, the similarity of the voiceprint feature vector to be authenticated and the registered voiceprint feature vector is determined by the similarity calculation module 2 according to the distance calculation result, and when the cosine distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector is larger, the similarity of the voiceprint feature vector to be authenticated and the registered voiceprint feature vector can be determined to be higher by taking the cosine distance as an example; when the cosine distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector is smaller, the lower the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector can be determined.

And the real-time feedback module 3 is used for determining a matching score corresponding to the voice frame to be authenticated according to the similarity, and feeding the matching score back to the user corresponding to the voice frame to be authenticated in real time so that the user can adjust pronunciation according to the matching score.

Specifically, after the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector is obtained through calculation, the matching score corresponding to the voice frame to be authenticated is determined according to the similarity by using the real-time feedback module 3. The higher the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector is, the higher the matching score corresponding to the voice frame to be authenticated is; the lower the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector, the lower the matching score corresponding to the voice frame to be authenticated.

Meanwhile, the matching score corresponding to the voice frame to be authenticated is fed back to the user corresponding to the voice frame to be authenticated in real time by using the real-time feedback module 3. That is, in the user authentication process, when the user sends out each voice frame, the matching score corresponding to each voice frame can be fed back to the user in real time. Taking the user terminal as an example, the matching score corresponding to each voice frame sent by the user can be displayed on the user terminal in real time. Thus, the user can adjust the pronunciation according to the matching score. For example, for a certain registered user, in the authentication process, if the matching score corresponding to each voice frame is generally low in a short period of time due to the fact that the pronunciation of the user is not adjusted, the user can know that a problem may exist in the pronunciation mode according to the fed-back matching score, and then the user can immediately adjust the pronunciation mode of the user to effectively improve the matching score corresponding to each voice frame in the subsequent pronunciation until the authentication is successful.

The continuous voiceprint authentication system with the feedback, provided by the invention, can acquire a voice frame to be authenticated in real time and extract a voiceprint characteristic vector to be authenticated corresponding to the voice frame to be authenticated; calculating the distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector, and determining the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector according to the distance calculation result; and determining a matching score corresponding to the voice frame to be authenticated according to the similarity, and feeding back the matching score to a user corresponding to the voice frame to be authenticated in real time so that the user can adjust pronunciation according to the matching score. The system can feed back the matching score of each voice frame in the voice sent by the user in real time in the voiceprint authentication process, so that the user is effectively guided to adjust the pronunciation mode of the user according to the matching score, the false rejection of the legal user can be effectively avoided, the successful authentication probability of the legal user is improved, and the authentication experience of the user is favorably improved.

Fig. 5 is a block diagram illustrating an apparatus of a continuous voiceprint authentication method with feedback according to an embodiment of the present invention. Referring to fig. 5, the apparatus of the continuous voiceprint authentication method with feedback includes: a processor (processor)51, a memory (memory)52, and a bus 53; wherein, the processor 51 and the memory 52 complete the communication with each other through the bus 53; the processor 51 is configured to call program instructions in the memory 52 to perform the methods provided by the above-mentioned method embodiments, including: acquiring a voice frame to be authenticated in real time, and extracting a voiceprint characteristic vector to be authenticated corresponding to the voice frame to be authenticated; calculating the distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector, and determining the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector according to the distance calculation result; and determining a matching score corresponding to the voice frame to be authenticated according to the similarity, and feeding back the matching score to a user corresponding to the voice frame to be authenticated in real time so that the user can adjust pronunciation according to the matching score.

The present embodiment discloses a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the method provided by the above-mentioned method embodiments, for example, comprising: acquiring a voice frame to be authenticated in real time, and extracting a voiceprint characteristic vector to be authenticated corresponding to the voice frame to be authenticated; calculating the distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector, and determining the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector according to the distance calculation result; and determining a matching score corresponding to the voice frame to be authenticated according to the similarity, and feeding back the matching score to a user corresponding to the voice frame to be authenticated in real time so that the user can adjust pronunciation according to the matching score.

The present embodiments provide a non-transitory computer-readable storage medium storing computer instructions that cause the computer to perform the methods provided by the above method embodiments, for example, including: acquiring a voice frame to be authenticated in real time, and extracting a voiceprint characteristic vector to be authenticated corresponding to the voice frame to be authenticated; calculating the distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector, and determining the similarity between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector according to the distance calculation result; and determining a matching score corresponding to the voice frame to be authenticated according to the similarity, and feeding back the matching score to a user corresponding to the voice frame to be authenticated in real time so that the user can adjust pronunciation according to the matching score.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

The above-described embodiments of the apparatus with feedback continuous voiceprint authentication method and the like are merely illustrative, where the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may also be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, the method of the present application is only a preferred embodiment and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A continuous voiceprint authentication method with feedback is characterized by comprising the following steps:

2. The method according to claim 1, wherein the step S3 further comprises:

3. The method according to claim 1, wherein the step S3 is followed by further comprising:

4. The method according to claim 3, wherein the step S3 is further followed by:

5. The method according to claim 1, wherein the step S1 further comprises:

6. The method according to claim 1, wherein the step S2 is preceded by:

7. The method according to claim 1, wherein the calculating the distance between the voiceprint feature vector to be authenticated and the registered voiceprint feature vector in step S2 further comprises:

8. A continuous voiceprint authentication system with feedback, comprising:

9. An apparatus of a continuous voiceprint authentication method with feedback, comprising:

at least one processor; and

at least one memory communicatively coupled to the processor, wherein:

the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 7.

10. A non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the method of any one of claims 1 to 7.