WO2017114307A1

WO2017114307A1 - Voiceprint authentication method capable of preventing recording attack, server, terminal, and system

Info

Publication number: WO2017114307A1
Application number: PCT/CN2016/111714
Authority: WO
Inventors: 徐燕军; 何朔; 尹亚伟; 万四爽
Original assignee: 中国银联股份有限公司
Priority date: 2015-12-30
Filing date: 2016-12-23
Publication date: 2017-07-06
Also published as: CN105933272A

Abstract

This application provides a voiceprint authentication method capable of preventing a recording attack, a server, a terminal, and a system, the voiceprint authentication method comprising: generating a character combination and a character pronunciation rule on the basis of a voiceprint authentication request of a user; sending the character combination and the character pronunciation rule to a requesting terminal; receiving a user voice input by the requesting terminal on the basis of the character combination and the character pronunciation rule; performing voiceprint authentication on the basis of the user voice, the character combination, and the character pronunciation rule; and sending a voiceprint authentication result to the requesting terminal. The application can effectively prevent a recording attack.

Description

Voiceprint authentication method, server, terminal and system capable of preventing recording attacks

Technical field

The present application belongs to the field of voiceprint recognition, and particularly relates to a voiceprint authentication method, a server, a terminal, and a system capable of preventing a recording attack.

Background technique

Like the fingerprint, the voiceprint is a very important biological feature that can characterize people. Compared with traditional password authentication and other means, the voiceprint has high security and convenience. The most commonly used attacks in voiceprint authentication are recording replay attacks, speaker spoofing attacks, and forged authentication voice attacks.

The recording playback attack means that the attacker obtains the user's voice sample through various means through the high-fidelity recording device, and uses the original recording of the user or the method of cutting, splicing, etc. to synthesize the "speaker true sound", and then in the authentication system. When the user's voice is collected, it is played back through the high-fidelity power amplifier to attack. A speaker phishing attack is an attack by an attacker who is good at defaulting the voice of others by imitating the speaker's way of speaking and pronunciation. Forgery authentication voice attack refers to the attack by attacking the voice of the attacker through techniques such as synthesis, conversion, and splicing.

The attacker's counterfeit attack requires the attacker to have a good ability to imitate. Forgery of the authenticated voice attack also requires high professional skills. These two attacks are inherently high in attack, and whether it is an analog sound or a fake sound, it is not a true sound. The existing voiceprint recognition technology can basically cope with these two types of attacks.

Recording playback attacks are very important issues in voiceprint recognition. Attackers acquire sounds and then use software synthesis to attack. There are two cases of recording attacks. One is that the user's voice is stolen in other situations to attack. The other is that the user attacks the voice of the user through malware during the voiceprint recognition.

For the recording attack, in the prior art, there are mainly two solutions as follows:

The first scheme is to distinguish whether the recording content is by analyzing the difference in the channel characteristic pattern between the recording and the original speech; the second scheme is to verify the speaker's voiceprint and also verify the speaker's speech content. Because the recording attacker does not know the content of this speech.

However, the solution has a high demand for sound signal quality, signal-to-noise ratio, channel quality, etc., and the effect achieved in practical applications is not very good.

In the second scheme, if the user randomly reads and writes a large amount of text, the user experience is poor, if the user's voice input is reduced, such as a patent (application number: 201310123555.0; invention name: based on the dynamic password voice identity Recognition system and method), select and combine from 26 English letters and 10 numbers. After each random combination of production dynamic passwords, let the user input by voice. Because they do not know the dynamic password of each production in advance, they can resist A simple recording attack is a better solution. However, since the patent only randomly combines 36 characters in 26 English letters and 10 numbers, if the attacker separates the 36 characters by means of recording separation, the attacker only needs to obtain any random string. Simply splicing through 36 characters for attack.

Summary of the invention

The present invention provides a voiceprint authentication method, a server, and a terminal, which are provided with a function of preventing a recording attack, and are used for solving the defect of preventing a recording attack in the prior art, and cannot effectively prevent a recording attack.

In order to solve the above technical problem, the present application provides a voiceprint authentication method capable of preventing a recording attack, including:

Generating a character combination and a pronunciation rule of a character according to a user's voiceprint authentication request;

Transmitting the character combination and the pronunciation rule of the character to the requesting terminal;

Receiving, by the requesting terminal, a user voice input according to the character combination and a pronunciation rule of the character;

Performing voiceprint authentication according to the user voice, the character combination, and the pronunciation rule of the character;

Transmitting the voiceprint authentication result to the requesting terminal.

The present application further provides a voiceprint authentication method capable of preventing a recording attack, including:

Sending a user's voiceprint authentication request to the server;

Receiving and displaying a combination of characters sent by the server and a pronunciation rule of the character;

Receiving a user voice input by the user according to the character combination and the pronunciation rule of the character;

Transmitting the user voice to the server;

Receiving a voiceprint authentication result sent by the server.

The present application further provides a voiceprint authentication server capable of preventing recording, including:

a generating unit, configured to generate a character combination and a pronunciation rule of the character according to a request of the user;

a sending unit, configured to send the character combination and the pronunciation rule of the character to the requesting terminal, and send the voiceprint authentication result to the requesting terminal;

a receiving unit, configured to receive a user voice input by the requesting terminal according to the character combination and a pronunciation rule of a character;

a sound detecting unit, configured to perform voiceprint authentication according to the user voice, the character combination, and a pronunciation rule of the character;

The present application further provides a voiceprint authentication terminal capable of preventing a recording attack, including:

a requesting unit, configured to send a user's voiceprint authentication request to the server;

a receiving unit, configured to receive and display a character combination sent by the server and a pronunciation rule of the character, and receive a voiceprint authentication result sent by the server;

An input unit, configured to receive a user voice input by a user according to the character combination and a pronunciation rule of the character;

And a sending unit, configured to send the user voice to the server.

The present application further provides a voiceprint authentication system capable of preventing a recording attack, the system comprising a server and a requesting terminal, wherein the server is configured to generate a character combination and a pronunciation rule of a character according to a user's voiceprint authentication request; The character combination and the pronunciation rule of the character are sent to the requesting terminal; the user voice input by the requesting terminal according to the character combination and the pronunciation rule of the character is received; and the sound is performed according to the user voice, the character combination, and the pronunciation rule of the character. Pattern authentication; sending the voiceprint authentication result to the requesting terminal;

The requesting terminal is configured to send a user's voiceprint authentication request to the server; receive and display the character combination sent by the server and the pronunciation rule of the character; and receive the user voice input by the user according to the character combination and the pronunciation rule of the character; Transmitting the user voice to the server; receiving a voiceprint authentication result sent by the server.

The voiceprint authentication method, server, terminal and system capable of preventing recording attacks proposed by the present application can effectively prevent recording attacks by verifying whether characters and pronunciations in the user voice are consistent with the character combination generated by the server and the pronunciation rules of the characters. Even if the attacker can obtain the voice content through other channels, the attacker cannot satisfy the requirement of the pronunciation mode. Further, in order to prevent the user voice repeatedly input by the user from being subjected to a recording attack, it is determined that the character and the pronunciation mode in the user voice are consistent with the character combination generated by the server and the pronunciation rule of the character, and the current voice and historical voice library to be verified are also determined. Whether the voice of the user is consistent. If they are consistent, there is a recording attack. This application can effectively prevent recording attacks in voiceprint authentication.

DRAWINGS

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present application, Those skilled in the art can also obtain other drawings based on these drawings without paying any creative work.

1 is a flowchart of a voiceprint authentication method capable of preventing a recording attack according to an embodiment of the present application;

2 is a flowchart of a voiceprint authentication process capable of preventing a recording attack according to an embodiment of the present application;

3 is a flowchart of a voiceprint authentication process capable of preventing a recording attack according to an embodiment of the present application;

4 is a waveform diagram corresponding to the pronunciation of the number “0” according to an embodiment of the present application;

FIG. 5 is a flowchart of a voiceprint authentication method capable of preventing a recording attack according to an embodiment of the present application;

6 is a voiceprint authentication server capable of preventing a recording attack according to an embodiment of the present application;

FIG. 7 is a voiceprint authentication terminal capable of preventing a recording attack according to an embodiment of the present application;

FIG. 8 is a voiceprint authentication system capable of preventing a recording attack according to an embodiment of the present application; FIG.

FIG. 9 is a flowchart of a voiceprint authentication method with a function of preventing a recording attack according to an embodiment of the present application.

detailed description

In order to make the technical features and effects of the present application more obvious, the technical solutions of the present application are further described below with reference to the accompanying drawings, and the present application may also be described or implemented in various other specific examples, and any person skilled in the art is in the scope of the claims. Equivalent transformations made within the scope of protection of this application.

As shown in FIG. 1 , FIG. 1 is a flowchart of a voiceprint authentication method capable of preventing a recording attack according to an embodiment of the present application.

This embodiment is a voiceprint authentication method described on the server side. The voiceprint authentication is performed according to the user voice fed back by the terminal, the character combination generated by the server, and the pronunciation rule of the character. This embodiment can prevent recording attacks to a certain extent.

Specifically, the voiceprint authentication method capable of preventing a recording attack includes the following steps:

Step 101: Generate a character combination and a pronunciation rule of the character according to a user's voiceprint authentication request;

The character combination includes but is not limited to letters, numbers, Chinese characters, etc., and the pronunciation rules of the characters include, but are not limited to, the pitch of the pronunciation, the length of the pronunciation, and the like. In one embodiment, each character in the character combination corresponds to one pronunciation rule, and the other implementation In the example, the two characters in the character combination correspond to one pronunciation rule, and the present application does not limit the specific form of the pronunciation rule of the characters in the character combination and the character combination.

In an embodiment of the present application, the character combination and the pronunciation rule of the character are randomly generated.

Step 102: Send a character combination and a pronunciation rule of the character to the requesting terminal;

The terminals described in the present application include, but are not limited to, mobile phones, PADs, computers, and notebooks.

Step 103: Receive a user voice input by the requesting terminal according to the character combination and a pronunciation rule of a character;

Step 104: Perform voiceprint authentication according to the user voice, the character combination, and the pronunciation rule of the character;

Step 105: Send the voiceprint authentication result to the requesting terminal.

In this embodiment, even if the attacker can obtain the voice character information, the pronunciation rule of the character cannot be obtained, and by adding the authentication of the pronunciation rule, the recording attack can be effectively prevented.

In detail, step 104 further includes:

Determining whether the voice of the user voice and the history input by the user are the same person's voice;

Determining whether a character in the user voice is the same as a character in the character combination;

Determining whether a pronunciation manner of a character in the user voice matches a pronunciation rule of the character;

Only the voice input by the user voice and the user history is the same person, the characters in the user voice are the same as the characters in the character combination, and the pronunciation manner of the characters in the user voice and the pronunciation of the characters When the rule matching is satisfied at the same time, the voiceprint authentication is passed, and in other cases, the voiceprint authentication fails, that is, if the voice of the user voice and the history input by the user are not the same person, and/or the characters in the voice of the user If the characters in the character combination are different, and/or the pronunciation manner of the characters in the user voice does not match the pronunciation rule of the character, the voiceprint authentication does not pass.

The present application does not limit the order of the above-mentioned judging process, and any combination of sequences can realize the judgment of voiceprint authentication.

Optionally, as shown in FIG. 2, step 104 further includes:

Step 201: First, it is determined whether the voice input by the user voice and the user history is the same person's voice; if not the voice of the same person, the voiceprint authentication does not pass, if it is the voice of the same person, proceed to step 202;

In a specific implementation, before performing step 202, the user voice sent by the client is separated according to characters, and then the characters in the user voice are extracted.

Step 202: Determine whether the characters in the user voice are the same as the characters in the character combination.

If the characters in the user voice are different from the characters in the character combination, the voiceprint authentication fails, that is, the voiceprint authentication fails;

If the characters in the user voice are the same as the characters in the character combination, proceed to step 203;

Step 203: Determine whether a pronunciation manner of a character in the user voice matches a pronunciation rule of the character;

If the pronunciation mode of the character in the user voice does not match the pronunciation rule of the character, the voiceprint authentication does not pass;

If the pronunciation of the character in the user's voice matches the pronunciation rule of the character, the voiceprint authentication is passed.

The voiceprint authentication according to the sequence described in this embodiment can speed up the authentication, prevent the recording attack and improve the user experience. In the following embodiments, voiceprint authentication is performed in the order described in this embodiment unless otherwise specified.

Referring to FIG. 2, it is determined that the voice of the user voice and the user history input are the same person, the characters in the user voice are the same as the characters in the character combination, and the pronunciation of the characters in the user voice is After the manner is matched with the pronunciation rule of the character, the method further includes storing the user voice into the historical voice library, so as to facilitate subsequent retrieval of the voice information input by the user.

As shown in FIG. 3, in an embodiment of the present application, it is determined that the voice input by the user voice and the user history is the same person, the characters in the user voice are the same as the characters in the character combination, and the After the pronunciation of the characters in the user's voice is matched with the pronunciation rules of the characters, the method further includes:

Step 204: Determine whether the user voice is consistent with the voice of the user in the historical voice library.

If the user voice is consistent with the voice of the user in the historical voice library, the voiceprint authentication does not pass;

If the user voice is inconsistent with the voice of the user in the historical voice library, the voiceprint authentication is passed, and the user voice is stored in the historical voice library.

By verifying whether the user voice is consistent with the voice of the user in the historical voice library, it is possible to prevent a recording attack of the same user voice input in different voice authentications of the same user.

In an embodiment of the present application, step 204 of the previous embodiment further includes:

Extracting characteristic parameters of the user voice;

Calculating a Euclidean distance of a feature parameter of the user voice and a feature parameter of a voice of the user in a history database, where the Echo and the user are in history when the Euclidean distance is less than a predetermined threshold The voices in the voice library are consistent. When the Euclidean distance is greater than a predetermined threshold, the user voice is inconsistent with the voice of the user in the historical voice library.

The predetermined threshold value described in this embodiment can be determined based on the difference in the same sound that a person makes.

In a specific implementation, the detailed process of determining whether the user voice is consistent with the voice of the user in the historical voice library is:

1) The user voice is divided into multiple segments of speech according to characters, and each segment of speech is preprocessed, including framing, pre-emphasis, windowing, etc., to obtain a segment of sound that can be further calculated.

2) Find the start and end points of the active speech portion of each speech.

As shown in FIG. 4, FIG. 4 is a waveform diagram corresponding to the pronunciation of the numeral “0”. It can be seen from FIG. 4 that there are many silent segments or fine noise segments before and after the sound. If these invalid sound signals are not removed, the attacker can process the invalid sound end of the recording and affect the effect of the recording detection.

In a specific implementation, the start point and the end point of the effective part of the voice can be judged by the short-time energy and the short-time zero-crossing rate.

The short-time energy refers to the sum of the intensities of one frame of speech signals, and the short-term energy En of the n-th frame speech signals:

Where m is the mth sample point of the nth frame, N is the size of the frame, and x _n (m) is the normalized frequency of the mth sample point of the nth frame.

The short-term zero-crossing rate refers to the number of times a frame of a speech signal crosses the horizontal axis, denoted as Z _n .

When the short-time energy En exceeds the threshold E or the short-time zero-crossing rate Zn exceeds the threshold value Z, the voice is the beginning of the effective voice, when the short-time energy En is lower than the threshold E or the short-time zero-crossing rate Zn is lower than the valve At the value Z, the speech is the end of the active speech.

3) Using Mel scale cepstral coefficients (MFCC) to extract characteristic parameters for effective speech. This method is a relatively common feature parameter extraction method in the current sound processing, and will not be described herein again.

After recording the user's first three steps of preprocessing, splitting the invalid part of the voice and extracting the feature parameters, the voice representation of a certain character of the user is T:

T has N frame vectors {T(1), T(2), ... T(n), ..., T(N)}, and T(n) is a speech feature vector of the nth frame.

Perform the same preprocessing on the character pronunciation of the user in the history library, segment the voice invalid part and extract the feature parameters, and record it as R:

R has an M frame vector R = {R(1), R(2), ... R(m), ..., R(M)}, and R(m) is a speech feature vector of the mth frame.

4) Calculate the similarity between the user's voice and the sound stored in the historical speech library, that is, to calculate the similarity between T and R, which can be calculated by calculating the Euclidean distances of T and R.

d(T(i _n ), R(i _m )) represents the Euclidean distance between the feature of the i- _th frame in T and the feature of the i _m frame in R, if the two waveforms completely coincide in a certain frame, then The distance d is 0. In order to compare the similarities between them, the distance D[T, R] between them can be calculated, and the smaller the distance, the higher the similarity.

If N=M, that is, the lengths of the two speeches are the same, directly calculate the Euclidean distance D[T, R]=d(1,1)+d(2,2) of the voice stored in the user speech and the historical speech database. )+...+d(N,N), if the voices at both ends are exactly the same, then D[T,R]=0, in this way only T and R can be judged to be exactly the same, but the recording attacker is in the actual attack. It is often necessary to stretch, shorten, or delete the original recording in some places, so if you simply calculate the distance between them, you can't defend against such attacks.

When N and M are not the same, consider aligning T(n) and R(m). Alignment can be performed by linear expansion. If N<M can linearly map T to a sequence of M frames, calculate it between {R(1), R(2), ..., R(M)}. distance. However, the attacker does not process the entire sound, but often only processes part of the sound. If this method is used, it will recognize that the sound similarity between the two is very low.

Therefore, comparing the similarity of the speech T and R requires combining the time rule and the distance measurement, and by looking for the function i _m =Φ(i _n ), the time axis n of T is nonlinearly mapped onto the time axis m of R, and Let the distance D[T, R] of T and R satisfy:

among them:

Φ(i _n +1)≥Φ(i _n )

Φ(i _n +1)-Φ(i _n )≤1

It can be seen that the conditions for dynamic programming are clearly met and can be solved using a dynamic programming algorithm, where the dynamic programming polynomial is:

D(in,im)=d(T(in),R(im))+min{D(in-1,im),D(in-1,im-1),D(in-1,im- 2)}

So starting from point (l,1) (let D(1,1)=0) search, repeated recursion until (N,M) can get the optimal path, and D(N,M) is the best match. The matching distance corresponding to the path.

Since each person's speech is influenced by many factors, the sound of any person repeating the same character cannot be completely similar on the sound wave, and there must be a difference. The difference is defined as the predetermined threshold of the judgment. If D(N,M)=0, it means that the voices T and R at both ends are exactly the same. It can be proved that the voices T and R are one sound, and there may be a recording attack. If D(N,M)< threshold, it means that the voices T and R at both ends are similar to each other, and there may be recording attacks; if D(N,M)>=threshold, then T and R are not The same voice, there is no recording attack.

The voiceprint authentication method capable of preventing recording attacks proposed by the present application can effectively prevent recording attacks by verifying whether characters and pronunciations in the user's voice are consistent with the character combination generated by the server and the pronunciation rules of the characters, and the attacker can pass the attack effectively. The user voices obtained by other channels satisfy the voice content and cannot meet the requirements of the pronunciation mode. Further, in order to prevent the user voice repeatedly input by the user from being subjected to a recording attack, it is determined that the character and the pronunciation mode in the user voice are consistent with the character combination generated by the server and the pronunciation rule of the character, and the current voice and historical voice library to be verified are also determined. Whether the voice of the user is consistent. If they are consistent, there is a recording attack. This application can effectively prevent recording attacks in voiceprint authentication.

As shown in FIG. 5, FIG. 5 is a flowchart of a voiceprint authentication method capable of preventing a recording attack according to an embodiment of the present application. The method is described from the requesting terminal side. Specifically, the voiceprint authentication method includes:

Step 501: Send a user's voiceprint authentication request to the server;

Step 502: Receive and display a character combination sent by the server and a pronunciation rule of the character.

Step 503: Receive a user voice input by the user according to the character combination and the pronunciation rule of the character;

Step 504: Send the user voice to the server.

Step 505: Receive a voiceprint authentication result sent by the server.

As shown in FIG. 6, FIG. 6 is a voiceprint authentication server capable of preventing a recording attack according to an embodiment of the present invention. The server 600 includes a generating unit 601, configured to generate a character combination and a pronunciation of a character according to a request of a user. rule;

The sending unit 602 is configured to send the character combination and the pronunciation rule of the character to the requesting terminal, and send the voiceprint authentication result to the requesting terminal;

The receiving unit 603 is configured to receive a user voice input by the requesting terminal according to the character combination and a pronunciation rule of the character;

The sound detecting unit 604 is configured to perform voiceprint authentication according to the user voice, the character combination, and the pronunciation rule of the character.

As shown in FIG. 7, FIG. 7 is a voiceprint authentication terminal capable of preventing a recording attack according to an embodiment of the present application. Specifically, the authentication terminal 700 includes: a requesting unit 701, configured to send a voiceprint authentication request of a user to a server;

The receiving unit 702 is configured to receive and display a character combination sent by the server and a pronunciation rule of the character, and receive a voiceprint authentication result sent by the server;

The entry unit 703 is configured to receive a user voice input by the user according to the character combination and the pronunciation rule of the character;

The sending unit 704 is configured to send the user voice to the server.

As shown in FIG. 8, FIG. 8 is a voiceprint authentication system capable of preventing a recording attack according to an embodiment of the present application.

The voiceprint authentication system includes a server 600 and a requesting terminal 700, wherein the server 600 is configured to generate a character combination and a pronunciation rule of a character according to a user's voiceprint authentication request; and send the character combination and the pronunciation rule of the character to Receiving a user voice input by the requesting terminal according to the character combination and the pronunciation rule of the character; performing voiceprint authentication according to the user voice, the character combination, and the pronunciation rule of the character; and the voiceprint authentication result Sended to the requesting terminal;

The requesting terminal 700 is configured to send a user's voiceprint authentication request to the server; receive and display the character combination sent by the server and the pronunciation rule of the character; and receive the user voice input by the user according to the character combination and the pronunciation rule of the character. Transmitting the user voice to the server; receiving a voiceprint authentication result sent by the server.

In order to explain the technical solution of the present application more clearly, the following describes a specific embodiment. As shown in FIG. 9, the system workflow for preventing a recording attack is:

Step 901: The client sends an identity authentication request to the server.

Step 902: The server receives an identity authentication request.

Step 903: The server randomly generates a verification character combination and a pronunciation mode of the character according to the identity authentication request, and sends the pronunciation mode to the client.

Step 904: After receiving the character combination to be verified and the pronunciation rule of the character sent by the server, the client prompts the user to read the character as required;

Step 905: The client receives the user voice read by the user, and sends the user voice read by the user to the server.

Step 906: The server performs voiceprint verification, and determines whether the received user voice and the pre-stored voice of the user are the same person, and the current conventional voiceprint verification algorithm may be used in the specific implementation;

If the voiceprint verification is not the same person, the user authentication failure is directly returned to the client;

If the voiceprint is verified to be the same person, continue recording detection;

Step 907: Verify whether the characters in the user voice are the same as the characters in the character combination generated by the server; if the characters in the user voice are different from the characters in the character combination generated by the server, the character verification in the user voice does not pass, and returns User authentication fails to the client; if the characters in the user voice are the same as the characters in the server-generated character combination, the character verification in the user voice passes, proceeding to step 908;

Step 908: Verify whether the pronunciation mode of the character in the user voice is the same as the pronunciation mode of the character generated by the server. If the pronunciation mode of the character in the user voice is different from the pronunciation mode of the character generated by the server, the character pronunciation mode verification in the user voice is performed. If not, the user authentication failure is returned to the client; if the pronunciation of the character in the user voice is the same as that of the character generated by the server, the pronunciation of the character in the user voice is verified, and the process proceeds to step 909;

Step 909: Verify that the user voice exists in the historical voice library. If yes, it proves that there is a recording attack, the authentication fails, and the authentication failure result is sent to the client; if not, the voiceprint authentication passes, and the user voice is stored in In the historical voice library, the voiceprint authentication is sent to the client through the result.

The process of verifying whether the user voice exists in the historical voice library has been described in detail in the above embodiment, and details are not described herein again. After the voiceprint authentication is passed, the client continues the corresponding operation, and this application does not limit this.

In an embodiment of the present application, an electronic device is further provided, the electronic device includes: a processor; and a memory including computer readable instructions that, when executed, cause the processor to perform the following operations:

Performing voiceprint authentication according to the user voice, the character combination, and the pronunciation rule of the character; and transmitting the voiceprint authentication result to the requesting terminal.

In another embodiment of the present application, there is also provided an electronic device, comprising: a processor; and a memory including computer readable instructions that, when executed, cause the processor to perform the following operations :

Sending a user's voiceprint authentication request to the server;

Transmitting the user voice to the server;

Receiving a voiceprint authentication result sent by the server.

Those skilled in the art will appreciate that embodiments of the present application can be provided as a method, system, or computer program product. Thus, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment in combination of software and hardware. Moreover, the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

The above description is only for explaining the technical solutions of the present application, and those skilled in the art can modify and change the above embodiments without departing from the spirit and scope of the present application. Therefore, the scope of protection of the application should be determined by the scope of the claims.

Claims

A voiceprint authentication method capable of preventing a recording attack, wherein,

Generating a character combination and a pronunciation rule of a character according to a user's voiceprint authentication request;

Transmitting the character combination and the pronunciation rule of the character to the requesting terminal;

Receiving, by the requesting terminal, a user voice input according to the character combination and a pronunciation rule of the character;

Performing voiceprint authentication according to the user voice, the character combination, and the pronunciation rule of the character; and transmitting the voiceprint authentication result to the requesting terminal.
The voiceprint authentication method capable of preventing a recording attack according to claim 1, wherein the voiceprint authentication according to the user voice, the character combination, and the pronunciation rule of the character further includes,

Determining whether the voice of the user voice and the history input by the user are the same person's voice;

Determining whether a character in the user voice is the same as a character in the character combination;

Determining whether a pronunciation manner of a character in the user voice matches a pronunciation rule of the character;

Only the voice input by the user voice and the user history is the same person, the characters in the user voice are the same as the characters in the character combination, and the pronunciation manner of the characters in the user voice and the pronunciation of the characters When the rules are matched, the voiceprint authentication is passed, and in other cases, the voiceprint authentication does not pass.
The voiceprint authentication method capable of preventing a recording attack according to claim 2, wherein it is determined that the user voice is the same person as the voice input by the user history, and the character in the user voice is combined with the character The characters are the same and the pronunciation manner of the characters in the user voice is matched with the pronunciation rule of the character, and includes

The user voice is stored in a historical voice library.
The voiceprint authentication method capable of preventing a recording attack according to claim 2, wherein it is determined that the user voice is the same person as the voice input by the user history, and the character in the user voice is combined with the character The characters are the same and the pronunciation manner of the characters in the user voice is matched with the pronunciation rule of the character, and includes

Determining whether the user voice is consistent with the voice of the user in the historical voice library;

If the user voice is consistent with the voice of the user in the historical voice library, the voiceprint authentication does not pass;

If the user voice is inconsistent with the voice of the user in the historical voice library, the voiceprint authentication is passed, and the user voice is stored in the historical voice library.
The voiceprint authentication method capable of preventing a recording attack according to claim 4, wherein determining whether the user voice is consistent with the voice of the user in the historical voice library further includes

Extracting characteristic parameters of the user voice;

Calculating a Euclidean distance of a feature parameter of the user voice and a feature parameter of a voice of the user in a history database, where the Echo and the user are in history when the Euclidean distance is less than a predetermined threshold The voices in the voice library are consistent. When the Euclidean distance is greater than a predetermined threshold, the user voice is inconsistent with the voice of the user in the historical voice library.
The voiceprint authentication method capable of preventing a recording attack according to claim 5, wherein extracting the feature parameters of the user voice further comprises

Performing pre-processing on the user voice, and dividing the user voice into multiple segments of speech according to characters;

Find the start and end points of the active speech portion of each speech;

Extract the feature parameters of the active speech part.
The voiceprint authentication method capable of preventing a recording attack according to claim 1, wherein the character combination and the pronunciation rule of the character are randomly generated.
A voiceprint authentication method capable of preventing a recording attack, wherein,

Sending a user's voiceprint authentication request to the server;

Receiving and displaying a combination of characters sent by the server and a pronunciation rule of the character;

Receiving a user voice input by the user according to the character combination and the pronunciation rule of the character;

Transmitting the user voice to the server;

Receiving a voiceprint authentication result sent by the server.
A voiceprint authentication server capable of preventing recording attacks, including,

a generating unit, configured to generate a character combination and a pronunciation rule of the character according to a request of the user;

a sending unit, configured to send the character combination and the pronunciation rule of the character to the requesting terminal, and send the voiceprint authentication result to the requesting terminal;

a receiving unit, configured to receive a user voice input by the requesting terminal according to the character combination and a pronunciation rule of a character;

The sound detecting unit is configured to perform voiceprint authentication according to the user voice, the character combination, and the pronunciation rule of the character.
A voiceprint authentication terminal capable of preventing a recording attack, wherein,

a requesting unit, configured to send a user's voiceprint authentication request to the server;

a receiving unit, configured to receive and display a character combination sent by the server and a pronunciation rule of the character, and receive a voiceprint authentication result sent by the server;

An input unit, configured to receive a user voice input by a user according to the character combination and a pronunciation rule of the character;

And a sending unit, configured to send the user voice to the server.
A voiceprint authentication system capable of preventing a recording attack, comprising: a server and a requesting terminal, wherein the server is configured to generate a character combination and a pronunciation rule of a character according to a user's voiceprint authentication request; The pronunciation rule of the character is sent to the requesting terminal; the user voice input by the requesting terminal according to the character combination and the pronunciation rule of the character is received; the voiceprint authentication is performed according to the user voice, the character combination, and the pronunciation rule of the character; The voiceprint authentication result is sent to the requesting terminal;

The requesting terminal is configured to send a user's voiceprint authentication request to the server; receive and display the character combination sent by the server and the pronunciation rule of the character; and receive the user voice input by the user according to the character combination and the pronunciation rule of the character; Transmitting the user voice to the server; receiving a voiceprint authentication result sent by the server.