CN113949559A

CN113949559A - Voiceprint recognition attack defense method, device and system

Info

Publication number: CN113949559A
Application number: CN202111200817.XA
Authority: CN
Inventors: 谢晓昕; 曾炜; 丁育祯; 王洁如
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2021-10-14
Filing date: 2021-10-14
Publication date: 2022-01-18

Abstract

A voiceprint recognition attack defense method, device and system can be used in the fields of biological recognition, finance and other fields. The method comprises the following steps: according to the verification request, determining gesture registration information and voiceprint registration information corresponding to the verification request; generating a plurality of dynamic numbers corresponding to the gesture registration information according to the gesture registration information, and sending the gesture registration information and the dynamic numbers to a display terminal; carrying out voice recognition on user voice information sent by a display terminal, and comparing an obtained voice recognition result with a dynamic number to obtain a verification result; and if the verification is passed, extracting voiceprint characteristics of the voice information of the user, and comparing the obtained voiceprint characteristic information with the voiceprint registration information to obtain a voiceprint recognition result. According to the invention, the gesture verification and the voiceprint recognition are combined, and the temporary generated dynamic numbers are used for voice information acquisition, so that an attacker is difficult to guess voice content and perform voice splicing, and the voiceprint recognition safety is improved.

Description

Voiceprint recognition attack defense method, device and system

Technical Field

The invention relates to the technical field of voiceprint recognition, in particular to a voiceprint recognition attack defense method, device and system.

Background

At present, the voiceprint recognition technology is weak in living body detection capability, difficult to defend attack behaviors like voice replay, voice splicing and the like, and low in safety. The voiceprint recognition technology generally needs to be matched with other means to detect whether a user is a real person, common means include methods for reading an 8-bit random number aloud by the user, and the like, but the method is easy to bypass by using a voice splicing attack means, and the security is not high.

Disclosure of Invention

Aiming at the problems in the prior art, embodiments of the present invention mainly aim to provide a method, an apparatus, and a system for defending a voiceprint recognition attack, which can improve the security of voiceprint recognition without reducing the usability of voiceprint recognition.

In order to achieve the above object, an embodiment of the present invention provides a method for defending against voiceprint recognition attacks, where the method includes:

according to a verification request sent by a display terminal, determining gesture registration information and voiceprint registration information corresponding to the verification request;

generating a plurality of dynamic numbers corresponding to the gesture registration information according to the gesture registration information, and sending the gesture registration information and the dynamic numbers to the display terminal;

carrying out voice recognition on user voice information sent by the display terminal to obtain a voice recognition result, and comparing the voice recognition result with the dynamic number to obtain a verification result;

and if the verification result is that the verification is passed, extracting voiceprint characteristics of the user voice information to obtain voiceprint characteristic information, and comparing the voiceprint characteristic information with the voiceprint registration information to obtain a voiceprint recognition result.

Optionally, in an embodiment of the present invention, the method further includes: and sending the voiceprint recognition result to the display terminal.

Optionally, in an embodiment of the present invention, the method further includes:

receiving gesture registration information, voice registration information and a registration verification number corresponding to the gesture registration information, which are sent by a display terminal;

performing voice recognition on the voice registration information to obtain a voice registration result, and comparing a registration verification number corresponding to the gesture registration information with the voice registration result to obtain a comparison result;

and if the comparison result is that the comparison is passed, extracting voiceprint features of the voice registration information to obtain voiceprint registration information, and storing the gesture registration information and the voiceprint registration information.

The embodiment of the invention also provides a voiceprint recognition attack defense method, which comprises the following steps:

generating a verification request according to received user input information, and sending the verification request to a server;

receiving gesture registration information and dynamic numbers sent by the server, and determining a graph node corresponding to the gesture registration information in a preset display image according to the gesture registration information;

sequentially filling the dynamic numbers into graph nodes corresponding to the gesture registration information, generating a plurality of random numbers, and filling the random numbers into unfilled graph nodes;

and displaying the preset display image filled with the graph nodes, receiving user voice information, and sending the user voice information to the server.

Optionally, in an embodiment of the present invention, the method further includes: and receiving a voiceprint recognition result sent by the server side, and displaying the voiceprint recognition result.

displaying a preset display image and receiving gesture registration information input by a user;

randomly generating a plurality of registration verification numbers, filling the registration verification numbers into the graph nodes of the preset display image, and determining the registration verification numbers corresponding to the gesture registration information;

displaying the preset display image filled with the registration verification digits, receiving voice registration information input by a user, and sending the registration verification digits, the gesture registration information and the voice registration information corresponding to the gesture registration information to a server.

The embodiment of the invention also provides a voiceprint recognition attack defense device, which comprises:

the registration information determining module is used for determining gesture registration information and voiceprint registration information corresponding to a verification request according to the verification request sent by the display terminal;

the dynamic number generation module is used for generating a plurality of dynamic numbers corresponding to the gesture registration information according to the gesture registration information and sending the gesture registration information and the dynamic numbers to the display terminal;

the verification result module is used for carrying out voice recognition on the user voice information sent by the display terminal to obtain a voice recognition result, and comparing the voice recognition result with the dynamic number to obtain a verification result;

and the voiceprint recognition result module is used for extracting voiceprint features of the user voice information to obtain voiceprint feature information if the verification result is that the verification is passed, and comparing the voiceprint feature information with the voiceprint registration information to obtain a voiceprint recognition result.

Optionally, in an embodiment of the present invention, the apparatus further includes: and the result sending module is used for sending the voiceprint recognition result to the display terminal.

Optionally, in an embodiment of the present invention, the apparatus further includes:

the registration information receiving module is used for receiving gesture registration information, voice registration information and registration verification numbers corresponding to the gesture registration information, which are sent by the display terminal;

the comparison result module is used for carrying out voice recognition on the voice registration information to obtain a voice registration result, and comparing a registration verification number corresponding to the gesture registration information with the voice registration result to obtain a comparison result;

and the registration information storage module is used for extracting voiceprint features of the voice registration information to obtain voiceprint registration information and storing the gesture registration information and the voiceprint registration information if the comparison result is that the comparison is passed.

the verification request module is used for generating a verification request according to the received user input information and sending the verification request to the server;

the graph node determining module is used for receiving the gesture registration information and the dynamic number sent by the server and determining a graph node corresponding to the gesture registration information in a preset display image according to the gesture registration information;

the graph node filling module is used for sequentially filling the dynamic numbers into the graph nodes corresponding to the gesture registration information, generating a plurality of random numbers and filling the random numbers into the unfilled graph nodes;

and the voice information receiving module is used for displaying the preset display image filled with the graph nodes, receiving the voice information of the user and sending the voice information of the user to the server.

Optionally, in an embodiment of the present invention, the apparatus further includes: and the result display module is used for receiving the voiceprint recognition result sent by the server and displaying the voiceprint recognition result.

the preset display image module is used for displaying a preset display image and receiving gesture registration information input by a user;

the registration verification number module is used for randomly generating a plurality of registration verification numbers, filling the registration verification numbers into the graph nodes of the preset display image and determining the registration verification numbers corresponding to the gesture registration information;

and the registration information sending module is used for displaying the preset display image filled with the registration verification digits, receiving voice registration information input by a user, and sending the registration verification digits, the gesture registration information and the voice registration information corresponding to the gesture registration information to the server.

The embodiment of the invention also provides a voiceprint recognition attack defense system, which comprises a server and a display terminal, wherein the server is in communication connection with the display terminal;

the display terminal generates a verification request according to the received user input information and sends the verification request to the server side;

the server side determines gesture registration information and voiceprint registration information corresponding to the verification request according to the verification request; generating a plurality of dynamic numbers corresponding to the gesture registration information according to the gesture registration information, and sending the gesture registration information and the dynamic numbers to the display terminal;

the display terminal determines a graph node corresponding to the gesture registration information in a preset display image according to the gesture registration information; sequentially filling the dynamic numbers into graph nodes corresponding to the gesture registration information, generating a plurality of random numbers, and filling the random numbers into unfilled graph nodes; displaying a preset display image filled with the graph nodes, receiving user voice information, and sending the user voice information to the server;

the server performs voice recognition on the user voice information to obtain a voice recognition result, and compares the voice recognition result with the dynamic number to obtain a verification result; and if the verification result is that the verification is passed, extracting voiceprint characteristics of the user voice information to obtain voiceprint characteristic information, and comparing the voiceprint characteristic information with the voiceprint registration information to obtain a voiceprint recognition result.

Optionally, in an embodiment of the present invention, the server is further configured to send the voiceprint recognition result to the display terminal.

Optionally, in an embodiment of the present invention, the display terminal is further configured to receive a voiceprint recognition result sent by the server, and display the voiceprint recognition result.

Optionally, in an embodiment of the present invention, the server is further configured to receive gesture registration information, voice registration information, and a registration verification number corresponding to the gesture registration information, which are sent by the display terminal; performing voice recognition on the voice registration information to obtain a voice registration result, and comparing a registration verification number corresponding to the gesture registration information with the voice registration result to obtain a comparison result; and if the comparison result is that the comparison is passed, extracting voiceprint features of the voice registration information to obtain voiceprint registration information, and storing the gesture registration information and the voiceprint registration information.

Optionally, in an embodiment of the present invention, the display terminal is further configured to display a preset display image, and receive gesture registration information input by a user; randomly generating a plurality of registration verification numbers, filling the registration verification numbers into the graph nodes of the preset display image, and determining the registration verification numbers corresponding to the gesture registration information; displaying the preset display image filled with the registration verification digits, receiving voice registration information input by a user, and sending the registration verification digits, the gesture registration information and the voice registration information corresponding to the gesture registration information to a server.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method when executing the program.

The present invention also provides a computer-readable storage medium storing a computer program for executing the above method.

According to the voice recognition method and device, gesture verification and voiceprint recognition are combined, dynamic numbers are temporarily generated by the graphic gestures registered by the user to acquire voice information, so that an attacker is difficult to guess voice content and perform voice splicing, and the voiceprint recognition safety is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.

FIG. 1 is a flowchart of a voiceprint recognition attack defense method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating registration for voiceprint recognition attack defense in an embodiment of the present invention;

FIG. 3 is a flowchart of a voiceprint recognition attack defense method in another embodiment of the present invention;

FIG. 4 is a flowchart of registration for voiceprint recognition attack defense in another embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a system for defending against voiceprint recognition attacks according to an embodiment of the present invention;

6A-6C are schematic diagrams of a registration graph gesture in an embodiment of the invention;

FIG. 7 is a schematic diagram of a verification graph gesture in an embodiment of the invention;

FIG. 8 is a flowchart of the overall operation of the voiceprint recognition attack defense system in an embodiment of the present invention;

FIG. 9 is a flowchart of a registration process of the voiceprint recognition attack defense system in an embodiment of the present invention;

FIG. 10 is a flowchart of the verification process of the voiceprint recognition attack defense system in an embodiment of the present invention;

FIG. 11 is a schematic structural diagram of a voiceprint recognition attack defense apparatus according to an embodiment of the present invention;

FIG. 12 is a schematic structural diagram of a voiceprint recognition attack defense apparatus according to an embodiment of the present invention;

FIG. 13 is a schematic diagram of a voiceprint recognition attack defense apparatus according to another embodiment of the present invention;

FIG. 14 is a schematic structural diagram of another apparatus for defending against voiceprint recognition attacks according to an embodiment of the present invention;

FIG. 15 is a schematic diagram of a specific structure of a voiceprint recognition attack defense device according to an embodiment of the present invention;

FIG. 16 is a schematic diagram illustrating another specific structure of a voiceprint recognition attack defense apparatus according to an embodiment of the present invention;

fig. 17 is a schematic structural diagram of an electronic device according to an embodiment of the invention.

Detailed Description

The embodiment of the invention provides a method, a device and a system for defending against voiceprint recognition attacks, which can be used in the fields of biological recognition, finance and other fields.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of a method for defending against a voiceprint recognition attack according to an embodiment of the present invention, and an execution subject of the method for defending against a voiceprint recognition attack according to an embodiment of the present invention includes, but is not limited to, a computer at a server. The method shown in the figure comprises the following steps:

step S10, according to the verification request sent by the display terminal, determining the gesture registration information and the voiceprint registration information corresponding to the verification request.

The server receives an authentication request sent by the display terminal, and the authentication request can be generated through a transaction request or a login request performed by a user on the display terminal. Specifically, the verification request may include information such as a user ID, and gesture registration information and voiceprint registration information uniquely associated with the verification request may be determined according to the information such as the user ID in the verification request.

Further, the gesture registration information and the voiceprint registration information are registration information generated when the user registers. Specifically, the gesture registration information includes a graphical gesture input by the user on the display terminal during registration, and the voiceprint registration information includes a voice input performed by the user on the display terminal during registration, so that the voiceprint feature extracted from the input voice by the server is the voiceprint registration information. In addition, the gesture registration information and the voiceprint registration information may further include information such as a user ID so as to correspond to the authentication request. Specifically, after the user successfully registers, the gesture registration information and the voiceprint registration information are stored in a storage space such as a database of the server.

Step S11, generating a plurality of dynamic numbers corresponding to the gesture registration information according to the gesture registration information, and sending the gesture registration information and the dynamic numbers to the display terminal.

And the server generates dynamic numbers according to the graphic gestures recorded in the gesture registration information. Specifically, as shown in fig. 6A, in the preset display image having 3 × 3 graph nodes (circles), if the graph gesture registered by the user is "U" type, the gesture registration information corresponds to 7 graph nodes. Then, the server randomly generates 7 dynamic numbers, which may be several bits.

Further, the server side sends the gesture registration information and the dynamic number corresponding to the verification request to the display terminal, and the display terminal performs filling and display operations in the preset display image. Specifically, as shown in fig. 7, the graph nodes in the filled preset display image are uniformly filled with 2-bit numbers. Therefore, the dynamic number in fig. 7 corresponding to the graphical gesture "U" in fig. 6A, i.e., the dynamic number generated by the server, should be 33235688312593.

Step S12, carrying out voice recognition on the user voice information sent by the display terminal to obtain a voice recognition result, and comparing the voice recognition result with the dynamic number to obtain a verification result.

After the display terminal performs filling and display operations on the preset display image, the display terminal collects voice information sent by a user and sends the obtained voice information of the user to the server. And the server performs voice recognition on the user voice information to obtain a voice recognition result. Specifically, the user voice information includes dynamic numbers corresponding to the gesture registration information read aloud by the user, and the voice recognition result obtained by the server through voice recognition includes dynamic numbers for converting voice into characters. For example, as shown in FIG. 7, if the user's registered graphical gesture is "U" shaped and the user speaks correctly, the speech recognition result should be 33235688312593.

Further, the server compares the obtained voice recognition result with the generated dynamic number to generate a verification result. If the voice recognition result is consistent with the sequence and the content of the dynamic number, the verification result is that the verification is passed, if the voice recognition result is not consistent with the sequence and the content of the dynamic number, the verification is failed, and the voiceprint recognition is judged to be failed.

Step S13, if the verification result is that the verification is passed, extracting voiceprint characteristics of the user voice information to obtain voiceprint characteristic information, and comparing the voiceprint characteristic information with the voiceprint registration information to obtain a voiceprint recognition result.

If the verification result is that the user passes the verification, the service end performs voiceprint feature extraction on the user voice information, and specifically, the existing conventional voiceprint extraction technology can be adopted to obtain the voiceprint feature information of the user. And comparing the voiceprint characteristic information with the voiceprint registration information to generate a voiceprint identification result. Specifically, if the voiceprint feature information is consistent with the voiceprint registration information in comparison or the similarity rate is higher than a preset threshold value, the voiceprint recognition result is that the voiceprint recognition is passed, otherwise, the voiceprint recognition is failed, and the voiceprint recognition failure is judged.

As an embodiment of the invention, the method further comprises: and sending the voiceprint recognition result to the display terminal.

And the server side sends the generated voiceprint recognition result to the display terminal, and the display terminal displays the voiceprint recognition result to the user. Specifically, when the verification result is verification failure or the voiceprint recognition result is recognition failure, the voiceprint recognition result returned to the display terminal for displaying is voiceprint recognition failure. And on the contrary, if the voiceprint recognition is passed, returning the voiceprint recognition result to the display terminal for displaying as that the voiceprint recognition is passed.

As an embodiment of the present invention, as shown in fig. 2, the method further includes:

step S14, receiving gesture registration information, voice registration information and registration verification numbers corresponding to the gesture registration information sent by the display terminal;

step S15, carrying out voice recognition on the voice registration information to obtain a voice registration result, and comparing the registration verification number corresponding to the gesture registration information with the voice registration result to obtain a comparison result;

and step S16, if the comparison result is that the comparison is passed, extracting voiceprint characteristics of the voice registration information to obtain voiceprint registration information, and storing the gesture registration information and the voiceprint registration information.

The voiceprint registration is required before the user performs a process of needing voiceprint identification and verification, such as transaction or login. A preset display image, which may be an image composed of 3 × 3 circles, is displayed to a user by a display terminal, and the user inputs a graphical gesture, for example, a "U" shape, through the display terminal, as shown in fig. 6A. Meanwhile, after the user inputs the graphic gesture, the display terminal randomly generates a plurality of digits of random numbers as the registration verification digits and fills the random numbers into the graphic nodes of the preset display image, as shown in fig. 7. And the user sequentially reads the numbers in each graph node according to the graph gesture input before, and the display terminal acquires the voice of the user and sends the voice as the voice registration information of the user to the server. And moreover, the graphic gesture is used as gesture registration information, and the registration verification number in the graphic node corresponding to the gesture registration information is sent to the server side at the same time. In addition, the gesture registration information may further include information such as a user ID.

Further, after receiving the gesture registration information, the voice registration information and the registration verification number corresponding to the gesture registration information, the server performs voice recognition on the voice registration information to obtain a voice registration result. Specifically, the voice registration result includes a plurality of digits. And comparing the registration verification number corresponding to the gesture registration information with the voice registration result to obtain a comparison result. Specifically, the registration verification number corresponding to the gesture registration information is compared with the number corresponding to the voice registration result, and if the contents and the sequence of the numbers are consistent, the comparison result is that the comparison is passed. If not, the comparison is failed, the user registration failure is judged, and meanwhile, the result of the registration failure is sent to the display terminal to be displayed to the user.

Further, if the comparison result is that the comparison is passed, the service end performs voiceprint feature extraction on the voice registration information, and specifically, the voiceprint registration information can be obtained by adopting the existing conventional voiceprint extraction technology. And moreover, the gesture registration information and the voiceprint registration information are stored, so that the registration is completed, and meanwhile, the successful registration result is sent to the display terminal to be displayed to the user.

Fig. 3 is a flowchart illustrating a voiceprint recognition attack defense method according to another embodiment of the present invention, where an execution subject of the voiceprint recognition attack defense method according to the embodiment of the present invention includes, but is not limited to, a computer of a display terminal. The method shown in the figure comprises the following steps:

step S20, according to the received user input information, generating a verification request, and sending the verification request to the server.

When a user performs an authentication process requiring voiceprint recognition such as transaction and login, an authentication request is generated by inputting information such as user personal information through operating a display terminal, and the display terminal sends the authentication request to a server. Specifically, the verification request may include information such as a user ID, and according to the information such as the user ID in the verification request, the server may determine gesture registration information and voiceprint registration information uniquely associated with the verification request.

And step S21, receiving the gesture registration information and the dynamic number sent by the server, and determining a graph node corresponding to the gesture registration information in a preset display image according to the gesture registration information.

The server determines corresponding gesture registration information according to the verification request, and generates a dynamic number corresponding to the gesture registration information, wherein the dynamic number can be a plurality of digits. The display terminal receives the gesture registration information and the dynamic number sent by the server side, and determines a graph node corresponding to the gesture registration information in a preset display image according to the gesture registration information.

Specifically, as shown in fig. 6A, if the graph gesture corresponding to the gesture registration information is "U" type, and the preset display image is composed of 3 × 3 graph nodes (circles), then the 7 graph nodes covered by the graph gesture are the graph nodes corresponding to the gesture registration information.

Step S22, sequentially filling the dynamic numbers into the graph nodes corresponding to the gesture registration information, generating a plurality of random numbers, and filling the random numbers into the unfilled graph nodes.

After the graph nodes corresponding to the gesture registration information are determined, the display terminal fills the dynamic numerical order into the graph nodes evenly. Specifically, as shown in fig. 7, the dynamic number is 33235688312593. After the dynamic numbers are uniformly filled, the display terminal randomly generates a plurality of random numbers and fills the random numbers into unfilled graph nodes.

And step S23, displaying the preset display image filled with the graph nodes, receiving the user voice information, and sending the user voice information to the server.

In which a preset display image in which the graphic node filling is completed is displayed, as shown in fig. 7. And the user sequentially reads the numbers in the graph nodes according to the registered graph gestures, and the display terminal acquires the user voice and sends the user voice as user voice information to the server.

Further, the service end performs voice recognition, voiceprint feature extraction, voiceprint feature comparison and other operations on the user voice information to generate a voiceprint recognition result. And the display terminal receives the voiceprint recognition result and displays the voiceprint recognition result to the user.

As an embodiment of the invention, the method further comprises: and receiving a voiceprint recognition result sent by the server side, and displaying the voiceprint recognition result.

As an embodiment of the present invention, as shown in fig. 4, the method further includes:

step S24, displaying a preset display image and receiving gesture registration information input by a user;

step S25, randomly generating a plurality of registration verification numbers, filling the registration verification numbers into the graph nodes of the preset display image, and determining the registration verification numbers corresponding to the gesture registration information;

and step S26, displaying the preset display image filled with the registration verification digits, receiving voice registration information input by a user, and sending the registration verification digits, the gesture registration information and the voice registration information corresponding to the gesture registration information to a server.

The voiceprint registration is required before the user performs a process of needing voiceprint identification and verification, such as transaction or login. A preset display image, which may be an image composed of 3 × 3 circles, is displayed to a user by a display terminal, and the user inputs a graphical gesture, for example, a "U" shape, through the display terminal, as shown in fig. 6A.

Further, after the user inputs the graphic gesture, the display terminal randomly generates a number of digits of random numbers as the registration verification digits, and fills the registration verification digits into the graphic nodes of the preset display image, as shown in fig. 7. And the user sequentially reads the numbers in each graph node according to the graph gesture input before, and the display terminal acquires the voice of the user and sends the voice as the voice registration information of the user to the server. And moreover, the graphic gesture is used as gesture registration information, and the registration verification number in the graphic node corresponding to the gesture registration information is sent to the server side at the same time. In addition, the gesture registration information may further include information such as a user ID.

Further, after receiving the gesture registration information, the voice registration information, and the registration verification number corresponding to the gesture registration information, the server performs operations such as voice recognition and voiceprint feature extraction on the voice registration information, and completes voiceprint registration. In addition, the server side sends the registration result to the display terminal so as to be displayed to the user.

Fig. 5 is a schematic structural diagram of a voiceprint recognition attack defense system according to an embodiment of the present invention, where the system includes a server 103 and a display terminal 100, and the server 103 is in communication connection with the display terminal 100;

the display terminal 100 generates an authentication request according to the received user input information, and sends the authentication request to the server 103.

When a user performs an authentication process requiring voiceprint recognition such as transaction and login, an authentication request is generated by inputting information such as user personal information through operating a display terminal, and the display terminal sends the authentication request to a server. Specifically, the verification request may include information such as a user ID, and according to the information such as the user ID in the verification request, the server may determine gesture registration information and voiceprint registration information uniquely associated with the verification request

The server 103 determines gesture registration information and voiceprint registration information corresponding to the verification request according to the verification request; according to the gesture registration information, a plurality of dynamic numbers corresponding to the gesture registration information are generated, and the gesture registration information and the dynamic numbers are transmitted to the display terminal 100.

The server receives an authentication request sent by the display terminal, specifically, the authentication request may include information such as a user ID, and gesture registration information and voiceprint registration information uniquely associated with the authentication request may be determined according to the information such as the user ID in the authentication request.

Further, the server generates dynamic numbers according to the graphical gestures recorded in the gesture registration information. Specifically, as shown in fig. 6A, in the preset display image having 3 × 3 graph nodes (circles), if the graph gesture registered by the user is "U" type, the gesture registration information corresponds to 7 graph nodes. Then, the server randomly generates 7 dynamic numbers, which may be several bits.

The display terminal 100 determines a graph node corresponding to the gesture registration information in a preset display image according to the gesture registration information; sequentially filling the dynamic numbers into graph nodes corresponding to the gesture registration information, generating a plurality of random numbers, and filling the random numbers into unfilled graph nodes; displaying the preset display image filled with the graph nodes, receiving user voice information, and sending the user voice information to the server 103.

The display terminal receives the gesture registration information and the dynamic number sent by the server, and determines a graph node corresponding to the gesture registration information in a preset display image according to the gesture registration information.

Further, after the graph nodes corresponding to the gesture registration information are determined, the display terminal fills the dynamic numerical order into the graph nodes uniformly. Specifically, as shown in fig. 7, the dynamic number is 33235688312593. After the dynamic numbers are uniformly filled, the display terminal randomly generates a plurality of random numbers and fills the random numbers into unfilled graph nodes.

Further, the display terminal displays a preset display image in which the graph node filling is completed, as shown in fig. 7. And the user sequentially reads the numbers in the graph nodes according to the registered graph gestures, and the display terminal acquires the user voice and sends the user voice as user voice information to the server.

The server 103 performs voice recognition on the user voice information to obtain a voice recognition result, and compares the voice recognition result with the dynamic number to obtain a verification result; and if the verification result is that the verification is passed, extracting voiceprint characteristics of the user voice information to obtain voiceprint characteristic information, and comparing the voiceprint characteristic information with the voiceprint registration information to obtain a voiceprint recognition result.

Further, if the verification result is that the verification is passed, the service end performs voiceprint feature extraction on the user voice information, and specifically, the existing conventional voiceprint extraction technology can be adopted to obtain the voiceprint feature information of the user. And comparing the voiceprint characteristic information with the voiceprint registration information to generate a voiceprint identification result. Specifically, if the voiceprint feature information is consistent with the voiceprint registration information in comparison or the similarity rate is higher than a preset threshold value, the voiceprint recognition result is that the voiceprint recognition is passed, otherwise, the voiceprint recognition is failed, and the voiceprint recognition failure is judged.

As an embodiment of the present invention, the server 103 is further configured to send the voiceprint recognition result to the display terminal.

As an embodiment of the present invention, the display terminal 100 is further configured to receive a voiceprint recognition result sent by the server, and display the voiceprint recognition result.

As an embodiment of the present invention, the server 103 is further configured to receive gesture registration information, voice registration information, and a registration verification number corresponding to the gesture registration information sent by the display terminal; performing voice recognition on the voice registration information to obtain a voice registration result, and comparing a registration verification number corresponding to the gesture registration information with the voice registration result to obtain a comparison result; and if the comparison result is that the comparison is passed, extracting voiceprint features of the voice registration information to obtain voiceprint registration information, and storing the gesture registration information and the voiceprint registration information.

As an embodiment of the present invention, the display terminal 100 is further configured to display a preset display image, and receive gesture registration information input by a user; randomly generating a plurality of registration verification numbers, filling the registration verification numbers into the graph nodes of the preset display image, and determining the registration verification numbers corresponding to the gesture registration information; displaying the preset display image filled with the registration verification digits, receiving voice registration information input by a user, and sending the registration verification digits, the gesture registration information and the voice registration information corresponding to the gesture registration information to a server.

In a specific embodiment of the present invention, the system for defending against voiceprint recognition attacks shown in fig. 5 specifically includes a display terminal and a server, where the display terminal mainly includes a screen and a microphone, and the server mainly includes a graphic gesture module, a dynamic digital module, a voice recognition module, a voiceprint recognition module, and a result verification module.

The working process of the voiceprint recognition attack defense system is mainly divided into two parts of voiceprint registration and voiceprint verification. When registering the voiceprint, a user needs to register a graph gesture firstly, and reads out the numbers corresponding to the gesture in sequence, the server verifies the digital content and the sequence firstly, and registers the voiceprint characteristics of the user after the verification is passed. When the voiceprint is verified (such as voiceprint login, voiceprint payment and the like), the server generates a group of random numbers to be distributed to each circle of the graph gesture, a user needs to read out corresponding numbers according to the sequence of the graph gesture registered before, the server verifies whether the digital content and the sequence read by the user are correct or not, and then verifies whether the voiceprint is correct or not.

Specifically, the display terminal 100 mainly includes a display screen 101 and a microphone 102, where the display screen 101 is used to display preset display images, dynamic numbers, prompt texts and other information for a user, and the microphone 102 mainly collects user voices.

The server 103 mainly comprises a graph gesture module 104, a dynamic digital module 105, a voice recognition module 106 and a voiceprint recognition module 107. The gesture module 104 mainly receives and stores gesture information registered by a user, and transmits the gesture information registered by the user to the display terminal. The dynamic digital module 105 generates dynamic digital information mainly from the registered gesture information. The speech recognition module 106 mainly determines whether the digital content and the sequence read by the user are correct. The voiceprint recognition module 107 mainly performs voiceprint recognition based on the voice information of the user, and determines whether the user is a registered voiceprint or not and whether the user is the user himself or herself.

In this embodiment, as shown in the schematic diagrams of the registered graphic gestures shown in fig. 6A to 6C, when registering the graphic gesture, the user needs to draw a graphic gesture connection line between 3 × 3 circles displayed on the screen (fig. 6A), the number of the connected circles is configurable at the server (for example, 4 to 7 circles), and then the user needs to draw the graphic gesture again to confirm whether the two drawn graphic gestures are consistent, so as to prevent a false input (fig. 6B), and finally the user is prompted to show that the gesture registration is successful (fig. 6C).

The image drawing gesture interface mainly comprises prompt characters 200 and a circle 201, wherein the prompt characters 200 mainly show operation prompt information to a user, and the circle 201 is in a 3 x 3 layout and assists the user in drawing gesture graphs. The trace 202 in fig. 6A primarily displays a graphical gesture path drawn by the user.

In this embodiment, as shown in the schematic diagram of verification graphic gesture shown in fig. 7, when the user uses the functions of voiceprint login, voiceprint payment, etc., the previously registered graphic gesture needs to be verified first. When the graphical gestures are verified, the server side generates a plurality of dynamic numbers according to the graphical gestures registered by the user before and sends the dynamic numbers to the display terminal. The number of digits of the dynamic number is associated with the number of circles of the graphical gesture, and may be configured, for example, to generate a 7-digit dynamic number if the user-registered graphical gesture is 7 circles when one circle is configured to display a 1-digit number, and to generate an 8-digit dynamic number if the user-registered graphical gesture is 4 circles when one circle is configured to display a 2-digit number. And the terminal displays 3 multiplied by 3 circles, distributes dynamic numbers on the circles corresponding to the graphic gestures in sequence, and fills the circles not corresponding to the graphic gestures with random numbers. The user needs to read out the numbers in the order of the graphical gestures to perform the voiceprint recognition verification.

In this embodiment, as shown in the overall work flow diagram of fig. 8, registered voiceprint information of a user is received, and then voiceprint authentication is performed when using functions such as voiceprint login and voiceprint payment.

The specific treatment process comprises the following steps:

step 400: registering voiceprint information, wherein graphic gestures and dynamic numbers need to be registered and verified at the same time when the voiceprint information is registered, and the specific processing flow is shown in fig. 9.

Step 401: the specific flow is shown in fig. 10 when voiceprint authentication is performed by using the voiceprint login and voiceprint payment functions.

In this embodiment, the processing flow of step 400 shown in fig. 9 is as follows:

step 500: the display terminal displays 3 × 3 circles and receives a user drawing a graphical gesture, and the number of circles covered by the graphical gesture is generally 4 to 7 (server configuration).

Step 501: the graphical gesture drawn again by the user is received and is guaranteed to be consistent with the graphical gesture drawn in step 500.

Step 502: the display terminal displays 3 × 3 circles, each circle displays a random number, and the number of digits of each circle is the same, and is generally 2 digits (or 1 digit, specifically, there is a server configuration). The user reads the numbers in the circles in the order of the graphical gesture drawn in step 500. The display terminal sends the voice of the reading number of the user to the server

Step 503: the voice recognition module at the server side judges whether the digital content and the sequence read by the user are correct or not, if so, the step 504 is skipped, and if not, the registration fails.

Step 504: the voiceprint recognition module of the server side extracts user voiceprint characteristic information, and the server side stores the user voiceprint characteristic information and the graph gesture information into a database together.

In this embodiment, the processing flow of step 401 shown in fig. 10 is as follows:

step 600: the server temporarily generates a plurality of dynamic numbers according to the graphic gestures registered by the user and sends the dynamic numbers to the display terminal.

Step 601: and the display terminal displays 3 x 3 circles, evenly fills dynamic numbers on the circles corresponding to the graphic gestures, and fills the circles which are not covered by the graphic gestures into random numbers.

Step 602: and the display terminal receives the voice of reading out the numbers in the circles by the user according to the sequence corresponding to the graphical gesture. And the display terminal uploads the user voice to the server.

Step 603: and the server-side voice recognition module judges whether the digital contents and the sequence read by the user are correct or not. If correct, jump to step 604, and if incorrect, voiceprint recognition fails.

Step 604: the server voiceprint recognition module extracts user voiceprint feature information, compares the user voiceprint feature information with previously registered voiceprint feature information, if the user voiceprint feature information is correct, the voiceprint recognition is successful, and if the user voiceprint feature information is incorrect, the voiceprint recognition fails.

Fig. 11 is a schematic structural diagram of a voiceprint recognition attack defense device according to an embodiment of the present invention, where the device includes:

a registration information determining module 810, configured to determine, according to an authentication request sent by a display terminal, gesture registration information and voiceprint registration information corresponding to the authentication request;

a dynamic number generation module 820, configured to generate a plurality of dynamic numbers corresponding to the gesture registration information according to the gesture registration information, and send the gesture registration information and the dynamic numbers to the display terminal;

a verification result module 830, configured to perform voice recognition on the user voice information sent by the display terminal to obtain a voice recognition result, and compare the voice recognition result with the dynamic number to obtain a verification result;

a voiceprint recognition result module 840, configured to, if the verification result is that the verification is passed, perform voiceprint feature extraction on the user voice information to obtain voiceprint feature information, and compare the voiceprint feature information with the voiceprint registration information to obtain a voiceprint recognition result.

As an embodiment of the present invention, as shown in fig. 12, the apparatus further includes: and a result sending module 850, configured to send the voiceprint recognition result to the display terminal.

As an embodiment of the present invention, as shown in fig. 13, the apparatus further includes:

a registration information receiving module 860, configured to receive gesture registration information, voice registration information, and a registration verification number corresponding to the gesture registration information sent by the display terminal;

a comparison result module 870, configured to perform voice recognition on the voice registration information to obtain a voice registration result, and compare a registration verification number corresponding to the gesture registration information with the voice registration result to obtain a comparison result;

and a registration information storage module 880, configured to, if the comparison result is that the comparison is passed, perform voiceprint feature extraction on the voice registration information to obtain voiceprint registration information, and store the gesture registration information and the voiceprint registration information.

Fig. 14 is a schematic structural diagram of another voiceprint recognition attack defense device according to an embodiment of the present invention, where the device includes:

the verification request module 910 is configured to generate a verification request according to the received user input information, and send the verification request to a server;

the graph node determining module 920 is configured to receive the gesture registration information and the dynamic number sent by the server, and determine a graph node corresponding to the gesture registration information in a preset display image according to the gesture registration information;

a graph node filling module 930, configured to sequentially fill the dynamic numbers into the graph nodes corresponding to the gesture registration information, generate a plurality of random numbers, and fill the random numbers into unfilled graph nodes;

and a voice information receiving module 940, configured to display the preset display image filled with the graph node, receive the user voice information, and send the user voice information to the server.

As an embodiment of the present invention, as shown in fig. 15, the apparatus further includes: and a result display module 950, configured to receive the voiceprint recognition result sent by the server, and display the voiceprint recognition result.

As an embodiment of the present invention, as shown in fig. 16, the apparatus further includes:

a preset display image module 960, configured to display a preset display image and receive gesture registration information input by a user;

a registration verification number module 970, configured to randomly generate a plurality of registration verification numbers, fill the registration verification numbers into the graph nodes of the preset display image, and determine the registration verification numbers corresponding to the gesture registration information;

the registration information sending module 980 is configured to display the preset display image filled with the registration verification digits, receive the voice registration information input by the user, and send the registration verification digits, the gesture registration information, and the voice registration information corresponding to the gesture registration information to the server.

Based on the same application concept as the voiceprint recognition attack defense method, the invention also provides the voiceprint recognition attack defense device. Because the principle of solving the problems of the voiceprint recognition attack defense device is similar to that of a voiceprint recognition attack defense method, the implementation of the voiceprint recognition attack defense device can refer to the implementation of the voiceprint recognition attack defense method, and repeated parts are not described again.

As shown in fig. 17, the electronic device 700 may further include: communication module 110, input unit 120, audio processing unit 130, display 160, power supply 170. It is noted that the electronic device 700 does not necessarily include all of the components shown in fig. 17; furthermore, the electronic device 700 may also include components not shown in fig. 17, which may be referred to in the prior art.

As shown in fig. 17, the central processor 1000, sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, the central processor 1000 receiving input and controlling the operation of the various components of the electronic device 700.

The memory 140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processor 1000 may execute the program stored in the memory 140 to realize information storage or processing, etc.

The input unit 120 provides input to the central processor 1000. The input unit 120 is, for example, a key or a touch input device. The power supply 170 is used to provide power to the electronic device 700. The display 160 is used to display an object to be displayed, such as an image or a character. The display may be, for example, an LCD display, but is not limited thereto.

The memory 140 may be a solid state memory such as Read Only Memory (ROM), Random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 140 may also be some other type of device. Memory 140 includes buffer memory 141 (sometimes referred to as a buffer). The memory 140 may include an application/function storage section 142, and the application/function storage section 142 is used to store application programs and function programs or a flow for executing the operation of the electronic device 700 by the central processor 1000.

The memory 140 may also include a data store 143, the data store 143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by the electronic device. The driver storage portion 144 of the memory 140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging application, address book application, etc.).

The communication module 110 is a transmitter/receiver 110 that transmits and receives signals via an antenna 111. The communication module (transmitter/receiver) 110 is coupled to the central processor 1000 to provide an input signal and receive an output signal, which may be the same as in the case of a conventional mobile communication terminal.

Based on different communication technologies, a plurality of communication modules 110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 110 is also coupled to a speaker 131 and a microphone 132 via an audio processor 130 to provide audio output via the speaker 131 and receive audio input from the microphone 132 to implement general telecommunications functions. Audio processor 130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 130 is also coupled to the central processor 1000, so that recording on the local can be enabled through the microphone 132, and so that sound stored on the local can be played through the speaker 131.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for defending against voiceprint recognition attacks, the method comprising:

2. The method of claim 1, further comprising: and sending the voiceprint recognition result to the display terminal.

3. The method of claim 1, further comprising:

4. A method for defending against voiceprint recognition attacks, the method comprising:

5. The method of claim 4, further comprising: and receiving a voiceprint recognition result sent by the server side, and displaying the voiceprint recognition result.

6. The method of claim 4, further comprising:

7. A voiceprint recognition attack defense apparatus, the apparatus comprising:

8. The apparatus of claim 7, further comprising: and the result sending module is used for sending the voiceprint recognition result to the display terminal.

9. The apparatus of claim 7, further comprising:

10. A voiceprint recognition attack defense apparatus, the apparatus comprising:

11. The apparatus of claim 10, further comprising: and the result display module is used for receiving the voiceprint recognition result sent by the server and displaying the voiceprint recognition result.

12. The apparatus of claim 10, further comprising:

13. A voiceprint recognition attack defense system is characterized by comprising a server and a display terminal, wherein the server is in communication connection with the display terminal;

14. The system according to claim 13, wherein the server is further configured to send the voiceprint recognition result to the display terminal.

15. The system according to claim 13, wherein the display terminal is further configured to receive a voiceprint recognition result sent by the server, and display the voiceprint recognition result.

16. The system according to claim 13, wherein the server is further configured to receive gesture registration information, voice registration information, and a registration verification number corresponding to the gesture registration information sent by the display terminal; performing voice recognition on the voice registration information to obtain a voice registration result, and comparing a registration verification number corresponding to the gesture registration information with the voice registration result to obtain a comparison result; and if the comparison result is that the comparison is passed, extracting voiceprint features of the voice registration information to obtain voiceprint registration information, and storing the gesture registration information and the voiceprint registration information.

17. The system according to claim 13, wherein the display terminal is further configured to display a preset display image and receive gesture registration information input by a user; randomly generating a plurality of registration verification numbers, filling the registration verification numbers into the graph nodes of the preset display image, and determining the registration verification numbers corresponding to the gesture registration information; displaying the preset display image filled with the registration verification digits, receiving voice registration information input by a user, and sending the registration verification digits, the gesture registration information and the voice registration information corresponding to the gesture registration information to a server.

18. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 6 when executing the computer program.

19. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1 to 6.