WO2018113526A1

WO2018113526A1 - Face recognition and voiceprint recognition-based interactive authentication system and method

Info

Publication number: WO2018113526A1
Application number: PCT/CN2017/114928
Authority: WO
Inventors: 刘�东; 李晓冬; 杨震泉; 彭世伟; 孙云松; 孟庆康
Original assignee: 四川长虹电器股份有限公司
Priority date: 2016-12-20
Filing date: 2017-12-07
Publication date: 2018-06-28
Also published as: CN106790054A

Abstract

The present invention relates to authentication technology. The present invention solves the problem wherein a detection result of existing face recognition authentication may be easily impersonated and provided thereby are a face recognition and voiceprint recognition-based interactive authentication system and method. The technical solution of the present invention may be summarized as follows: the face recognition and voiceprint recognition-based interactive authentication system comprises a terminal and a server, the terminal and the server being connected by means of a network, wherein the terminal is used for obtaining face video of a detected user, collecting voice audio data inputted by the user, sending the face video and the voice audio data to the server and displaying display prompt information sent by the server; and the server is used for matching user face feature parameters, matching a user voiceprint feature vector and gathering a voiceprint recognition result and a face recognition result to obtain an intersection; and if the intersection only has one result therein, the authentication is considered to be successful, and terminal authentication success information is returned. The present invention has the beneficial effects of improved security and being applicable to an authentication system.

Description

Interactive authentication system and method based on face recognition and voiceprint recognition

Technical field

The invention relates to an authentication technology, in particular to an authentication technology for face recognition and voiceprint recognition.

Background technique

With the advent of the Internet+ era, networked management, paperless office and electronic transactions have penetrated into every part of daily life. Virtual life and virtual market have gradually become the main channels for office workers to shop and relax, but while the Internet is convenient for people's lives, it is also a double-edged sword, because all activities or transactions are carried out in a virtual network, no one is human. Direct contact, even without the need for text communication, mutual trust and credentials rely on passwords, keys or SMS verification codes to achieve, and the Internet is an open network, an equal platform, and it is also an Control the child. Everything that is transmitted in the network may be stolen. The netizens usually use it for easy memory. It is usually a key and used everywhere. The quality and security of the available platforms are very different. It can be said that it is a place. The leak was broken everywhere. At present, it is gradually proposed to replace the traditional fixed key with the mobile phone random verification code. According to statistics, the mobile phone is one of the most easily lost personal assets.

The development of hardware technology, the popularity of smart phones and personal computers, biometrics technology has become the focus of people's attention recently. Biometrics technology authenticates legal identity through the physiological or behavioral characteristics of the human body, such as fingerprints, irises, Facial image recognition and DNA sequence matching matching.

Among them, fingerprint recognition, because it is easy to be forged, only needs to obtain the fingerprint of the other party from the daily necessities of the forged person, and the fingerprint can be forged. Therefore, the field of fingerprint identification is only a daily attendance record with low security requirements. in.

The iris recognition technology collects the annular part between the black pupil and the white sclera through the camera equipment, which contains many interlaced spots, filaments, crowns, stripes and crypts, so the camera The hardware equipment requirements are relatively high, and it is not easy to be commercialized on a large scale or promoted to ordinary users.

Single image recognition verification (face recognition verification) is also easy to use for static images (photos) to impersonate, while DNA sorting matches the threshold of recognition, which requires direct contact with the human body, so it is not suitable for "short" , flat, fast" Internet platform.

The human voice is rich in information of multiple dimensions, such as speech content, speech tone and sound characteristics. Voiceprint recognition is a technique for distinguishing different speakers through human voice characteristics. Different channel structures determine the sound. The uniqueness of the pattern.

Summary of the invention

The object of the present invention is to solve the problem that the detection result of face recognition authentication is easily replaced by impersonation, and an interactive authentication system and method based on face recognition and voiceprint recognition are provided.

The invention solves the technical problem, and adopts the technical solution that the interactive authentication system based on face recognition and voiceprint recognition comprises a terminal and a server, and the terminal and the server are connected through a network, wherein

The terminal is configured to acquire a facial video of the detected user, collect voice audio data input by the user, send the voice audio data to the server, and display display prompt information sent by the server;

The server is configured to perform matching between the facial feature parameters of the user and the user voiceprint feature vector, and perform the intersection of the voiceprint recognition result and the face recognition result. If there is only one result in the intersection, the verification is successful, and the return is successful. The terminal verifies the success information.

Further, the matching of the user facial feature parameters and the matching of the user voiceprint feature vector means that the server acquires the user facial feature parameters from the received facial video of the detected user, and obtains the user facial feature parameters and the server in advance. All stored facial feature parameters of the user are matched. If the matching is successful, the face recognition result is obtained, and then the preset voice password text is sent to the terminal, and after receiving the voice audio data sent by the voice collection module of the terminal, converting the voice audio data into text Content, and matching the text content with the previously sent voice password text. If the matching is successful, the voiceprint feature vector in the voice audio data is extracted, and matched with all user voiceprint feature vectors pre-stored by the server, and matched. Success will result in voiceprint recognition.

Specifically, the terminal includes a display module, a face video capture module, a voice collection module, and a first communication module, where the server includes a face recognition module, a voice recognition module, a verification module, a database, and a second communication module, and the display module The face video acquisition module and the voice collection module are respectively connected with the first communication module, and the face recognition module, the voice recognition module and the verification module are respectively connected with the second communication module, and the face recognition module and the voice recognition module are respectively connected with the verification module. The database module is respectively connected with the face recognition module, the voice recognition module and the verification module, and the first communication module and the second communication module are connected through a network.

The face video capture module is configured to acquire a facial video of the detected user and send the video to the face recognition module through the first communication module and the second communication module;

The voice collection module is configured to collect voice audio data input by the user and send the voice audio data to the voice recognition module through the first communication module and the second communication module;

The display module is configured to display display prompt information sent by the server, including face recognition failure information, voice password input incorrect information, verification failure information, voice password text, and verification success information;

The first communication module and the second communication module are used for information interaction between the terminal and the server;

The face recognition module is configured to filter and denoise the face video of the detected user, extract key frames, acquire user facial feature parameters according to the key frame, and select key feature parameters and stored in the database. All the user facial feature parameters are matched. If the matching is successful, the matching success result is sent to the verification module, and the successful matching result is the face recognition result. If the matching fails, the terminal face recognition failure information is returned;

The voice recognition module is configured to: after receiving the voice recognition request sent by the verification module, send the preset voice password text to the terminal, so that the terminal displays the voice password text through the display module, and is sent by the voice collection module of the terminal. After the voice audio data is converted into text content, and the text content is matched with the previously sent voice password text, if the matching fails, the recognition is failed, and the terminal voice password input incorrect information is returned, and if the matching is successful, the data is extracted. The voiceprint feature vector in the voice audio data is matched with all user voiceprint feature vectors stored in the database. If the match fails, the recognition fails, and the terminal voice recognition failure information is returned. If the match is successful, the match is successful. The result is sent to the verification module, and the successful result of the matching is the voiceprint recognition result;

The verification module is configured to send a voice recognition request to the voice recognition module after receiving the matching success result sent by the face recognition module, and after receiving the matching success result sent by the voice recognition module, and the face recognition module If the intersection is empty, the current user verification fails, and the terminal verification failure information is returned. If there is only one result in the intersection, the verification is successful, and the terminal verification success information is returned. If there is more than one result in the concentration, the voiceprint feature is not obvious, and the voice recognition request is resent to the voice recognition module. If a predetermined number of voice recognition requests have been sent at this time, the user authentication failure is considered, and the terminal verification failure information is returned. .

Further, the face video capture module is a camera module, and the voice capture module is a pickup.

Specifically, the face recognition module is configured with an image similarity preset value, and when the key feature parameter in the user facial feature parameter is matched with the user facial feature parameter stored in the database, if the matching result is When the threshold value of the facial feature parameter of each user is smaller than the preset value of the image similarity, it is determined that the matching is successful, otherwise it is determined that the matching fails.

Further, the successful matching result of the face recognition module includes user information, where the user information includes user age information.

Specifically, the voice recognition request sent by the verification module to the voice recognition module includes user age information or a voice password text when requesting to send a registration.

Further, in the voice recognition request sent by the verification module to the voice recognition module, if the voice recognition request is sent to the voice recognition module by the preset number of times, the voice recognition request includes the voice when requesting to send the registration. Password text.

Specifically, in the voice recognition module, the preset voice password text is an easy-to-read text or a number of numbers or a piece of news text or a voice password text corresponding to the user information.

Further, in the voice recognition module, before the preset voice password text is sent to the terminal, the voice recognition request is further determined according to the voice recognition request, and if the voice recognition request is requested to send the voice password text when registering, the voice recognition module selects the preset The voice password text is a voice password text corresponding to the user information, and if there is user age information in the voice recognition request, the user age is determined according to the user age information, and the preset is selected if the user is an elderly person or a minor. The voice password text is an easy-to-read text or a number of numbers, otherwise the selected preset voice password text is a piece of news text.

Specifically, after the preset voice password text is sent to the terminal, the voice recognition module starts timing and determines whether the voice and audio data sent by the terminal is received within a preset time, and if the time count reaches the preset time. If the voice audio data sent by the terminal is not received, the preset voice password text is replaced and the replaced preset voice password text is re-sent to the terminal, and the timing is restarted, and it is determined whether the terminal is sent within the preset time. The step of voice audio data.

An interactive authentication method based on face recognition and voiceprint recognition is applied to the above-mentioned interactive authentication system based on face recognition and voiceprint recognition, characterized in that it comprises the following steps;

Step 1. The user uses the terminal to perform user registration with the server, and the server stores the user information, the facial feature parameters of the user, and the user voiceprint feature vector in the database;

Step 2: When authenticating, the terminal acquires a facial video of the detected user and sends the video to the server;

Step 3: The server filters and denoises the facial video of the detected user, extracts the key frame, acquires the facial feature parameters of the user according to the key frame, and selects the key feature parameters and all the user facial feature parameters stored in the database. Matching, if the matching is successful, the face recognition result is obtained and proceeds to step 5, if the matching fails, the process proceeds to step 4;

Step 4, the server returns the terminal face recognition failure information, the terminal displays the face recognition failure and prompts the user, and returns to step 2;

Step 5: The server generates and sends a preset voice password text to the terminal.

Step 6, the terminal displays the voice password text, and collects the voice audio data input by the user and uploads it to the server;

Step 7. The server converts the received voice audio data into text content, and matches the text content with the previously sent voice password text. If the matching fails, the identification fails, and the terminal voice password input incorrect information is returned. Go to step 8, if the match is successful, go to step 9;

Step 8, the terminal displays the voice password input incorrect information, return to step 2;

Step 9. The server extracts the voiceprint feature vector in the voice audio data, and matches it with all user voiceprint feature vectors stored in the database. If the match fails, the recognition fails, and the terminal voice recognition failure information is returned, and the process proceeds to the step. 10, if the match is successful, the speech recognition result is obtained and proceeds to step 11;

Step 10: The terminal displays the voice recognition failure information, and returns to step 2;

Step 11. The server performs the intersection of the face recognition result and the voice recognition result. If the intersection is empty, it is considered that the current user verification fails, and the terminal verification failure information is returned, and the process proceeds to step 12. If there is only one result in the intersection, it is considered If the verification is successful, the terminal verification success information is returned. If there is more than one result in the intersection, the voiceprint feature is considered to be inconspicuous, and it is determined whether the current authentication has sent a preset number of voice password texts. If yes, the user verification is failed. The terminal verifies the failure information, proceeds to step 12, otherwise regenerates and sends the preset voice password text to the terminal, and returns to step 6;

In step 12, the terminal displays the verification failure information, and returns to step 2.

Specifically, step 1 includes the following steps:

Step 101: The user inputs user information to the terminal, and collects a face video or a plurality of face images through the terminal, and the terminal uploads the user information and the face video or the plurality of face images to the server;

Step 102: The server intercepts multiple face images from the face video or uses the received multiple images as face samples to obtain the facial feature parameters of the user, and performs face modeling and associates with the user information. Stored in the database, and randomly generated voice password text is sent to the terminal;

Step 103: The terminal displays the voice password text, and collects voice audio data of the user, and uploads the collected voice and audio data to the server;

Step 104: The server performs voiceprint feature vector extraction on the voice audio data, and associates the extracted voiceprint feature vector, voice audio data, and corresponding voice password text with the user information, and stores the data in the database.

Further, in step 102, the randomly generated voice password text is sent to the terminal, and at least one piece of voice password text is randomly generated and sent to the terminal in sequence;

In step 103, the terminal displays the voice password text, and collects the user's voice and audio data, and uploads the collected voice and audio data to the server, and the terminal displays the voice password text in sequence, when a voice password text is collected three times. After the user's voice and audio data, the next voice password text is displayed, and each of the three voice and audio data corresponding to all the voice password texts is obtained and sent to the server.

Specifically, in step 104, after receiving all the voice and audio data, the server separately extracts the voiceprint feature vector, and selects, for each voice password text, a voice and audio data in which the voiceprint feature vector is most obvious, and the voice is The password text, the selected voice and audio data, and the voiceprint feature vector are associated with the information system. Stored in the database.

Further, in step 11, the regenerating and sending the preset voice password text to the terminal, the regenerated preset voice password text is one of the voice password texts at the time of registration corresponding to the user information.

Specifically, in step 3, the image similarity preset value is set in the server, and when the key feature parameter in the user facial feature parameter is matched with the user facial feature parameter stored in the database, if the matching result is When the user facial feature parameter similarity threshold is smaller than the image similarity preset value, it is determined that the matching is successful, otherwise it is determined that the matching fails.

Further, in step 5, the preset voice password text is a randomly generated piece of readable text or a randomly generated piece of numbers or a randomly generated piece of news type text or a registered voice code text corresponding to the user information.

Specifically, in step 1, the user information includes user age information;

In step 3, the face recognition result includes user information;

In step 5, when the server generates and sends a preset voice password text to the terminal, if the user information in the face recognition result is displayed as an elderly person or a minor, the preset voice password text selected is an easy-to-read text or A number of digits, otherwise the selected preset voice password text is a piece of news text.

Further, in step 9, if the matching fails, it is further determined whether a preset number minus one voice password text has been generated, and if yes, the recognition is failed, and the terminal voice recognition failure information is returned, and the process proceeds to step 10, otherwise re-generating and The terminal sends the preset voice password text, and returns to step 6. The preset voice password text that is regenerated and sent to the terminal is a randomly generated piece of easy-to-read text or a randomly generated segment number or a randomly generated piece of news text. The length is greater than the preset sound password text generated last time.

Further, in step 9, the preset value of the voiceprint similarity is set in the server, and when the server matches the voiceprint feature vector in the extracted voice audio data with all the user voiceprint feature vectors stored in the database, If the threshold value of the voiceprint feature vector of each user in the matched result is less than the preset value of the voiceprint similarity, it is determined that the matching is successful, otherwise it is determined that the matching fails.

Specifically, in step 5, after the server generates and sends the preset voice password text to the terminal, the timing is also started;

And/or, in step 9, after the server regenerates and sends the preset voice password text to the terminal, the timing is also started;

And/or, in step 11, after the server regenerates and sends the preset voice password text to the terminal, the timing is also started;

Between step 5 and step 7, the following steps are further included:

Step A, the server determines whether the voice audio data sent by the terminal is received within a preset time, if the voice audio data sent by the terminal is not received after the preset time reaches the preset time, the process proceeds to step A, otherwise proceeds to step 7;

Step B: The server replaces the preset voice password text and resends the replaced preset voice password text to the terminal, and restarts the timing, and returns to step A, where the replaced preset voice password text is a re-randomly generated segment. Easy-to-read text or randomly generated numbers or randomly generated pieces of news text.

Further, in step 9, if the matching fails, after returning the terminal speech recognition failure information, the server also proceeds to step 13;

In step 11, if the verification is successful, returning the terminal verification success information, the server also proceeds to step 13, if it is considered that the current user verification fails, returning the terminal verification failure information, the server also proceeds to step 13;

Step 13: The server optimizes the face modeling corresponding to the user information in the face recognition result by using the face image received in the current authentication.

The invention has the beneficial effects that in the solution of the present invention, through the above-mentioned interactive authentication system and method based on face recognition and voiceprint recognition, face recognition and voiceprint recognition are used to achieve higher security authentication and improve security. Sex.

DRAWINGS

1 is a system block diagram of an interactive authentication system based on face recognition and voiceprint recognition according to an embodiment of the present invention.

detailed description

The technical solution of the present invention will be described in detail below with reference to the accompanying drawings and embodiments.

An interactive authentication system based on face recognition and voiceprint recognition according to the present invention, the system block diagram of which is shown in FIG. 1 , including a terminal and a server, where the terminal and the server are connected through a network, wherein the terminal is configured to acquire a facial video of the detected user and The voice audio data input by the user is collected and sent to the server, and the display prompt information sent by the server is displayed; the server is configured to perform matching of the user facial feature parameters and matching the user voiceprint feature vector, and the voiceprint recognition result is related to the person. The face recognition result is collected and intersected. If there is only one result in the intersection, the verification is successful, and the terminal verification success information is returned.

The interactive authentication method based on face recognition and voiceprint recognition according to the present invention is applied to the above-mentioned interactive authentication system based on face recognition and voiceprint recognition. First, the user uses the terminal to perform user registration with the server, and the server is in the database. The user information, the user facial feature parameter, and the user voiceprint feature vector are stored. When authenticating, the terminal acquires the facial video of the detected user and sends the video to the server, and the server filters and denoises the facial video of the detected user. And extracting the key frame, obtaining the user facial feature parameter according to the key frame, selecting the key feature parameter to match all the user facial feature parameters stored in the database, and if the matching fails, the server returns Returning the terminal face recognition failure information, the terminal displays the face recognition failure and prompts the user to return to the authentication step to re-authenticate. If the matching is successful, the face recognition result is obtained, and the preset voice password text is generated and sent to the terminal, and then the terminal Display the voice password text, and collect the voice audio data input by the user to upload to the server, and then convert the received voice audio data into text content, and match the text content with the previously sent voice password text, if the match If the failure is that the recognition fails, the terminal voice password input error information is returned, the terminal displays the voice password input incorrect information, and the step back to the authentication is re-authenticated. If the match is successful, the server extracts the voiceprint feature vector in the voice audio data. Match it with all user voiceprint feature vectors stored in the database. If the match fails, the recognition is failed, and the terminal voice recognition failure information is returned. The terminal displays the voice recognition failure information, and returns to the authentication step to re-authenticate. Successfully get speech recognition results The server performs the intersection of the face recognition result and the voice recognition result. If the intersection is empty, it is considered that the user verification fails, and the terminal verification failure information is returned, and the terminal displays the verification failure information, and returns to the authentication step to re-authenticate. If there is only one result in the intersection, it is considered that the verification is successful, and the terminal verification success information is returned. If there is more than one result in the intersection, the voiceprint feature is considered to be inconspicuous, and it is determined whether the authentication has sent a preset number of voice password texts, and if so If the user authentication fails, the terminal returns the terminal verification failure message. The terminal displays the verification failure information and returns to the authentication step to re-authenticate. Otherwise, it regenerates and sends the preset voice password text to the terminal, and returns to the terminal to display the voice password text. .

Example

An interactive authentication system based on face recognition and voiceprint recognition according to an embodiment of the present invention is shown in FIG. 1 , which includes a terminal and a server. The terminal and the server are connected through a network, and the terminal may include a display module and a face video capture module. The voice collection module and the first communication module, the server may include a face recognition module, a voice recognition module, a verification module, a database, and a second communication module, and the display module, the face video collection module, and the voice collection module are respectively connected to the first communication module. The face recognition module, the voice recognition module and the verification module are respectively connected with the second communication module, and the face recognition module and the voice recognition module are respectively connected with the verification module, and the database module is respectively connected with the face recognition module, the voice recognition module and the verification module. The first communication module and the second communication module are connected through a network.

The terminal is configured to acquire the facial video of the detected user and collect the voice audio data input by the user, and send the data to the server, and display the display prompt information sent by the server.

The terminal may include a display module, a face video acquisition module, a voice collection module, and a first communication module.

The face video capture module is configured to obtain the face video of the detected user and send it to the face recognition module through the first communication module and the second communication module; the camera module can be a camera module such as a camera.

The voice collection module is configured to collect voice and audio data input by the user and pass the first communication module and the second communication The module is sent to the speech recognition module; it can be a pickup such as a microphone.

The display module is configured to display display prompt information sent by the server, including face recognition failure information, voice password input incorrect information, verification failure information, voice password text, and verification success information.

The first communication module is used for information interaction between the terminal and the server.

The server is configured to perform matching between the facial feature parameters of the user and the user voiceprint feature vector, and combine the voiceprint recognition result with the face recognition result. If there is only one result in the intersection, the verification is successful, and the terminal verification is returned. Success information. Here, the matching of the user facial feature parameters and the matching of the user voiceprint feature vector is preferably: the server acquires the user facial feature parameters from the received facial video of the detected user, and acquires the obtained user facial feature parameters and all the pre-stored parameters of the server. The user facial feature parameters are matched, and the face recognition result is obtained after the matching is successful, and then the preset voice password text is sent to the terminal, and after receiving the voice audio data sent by the voice collection module of the terminal, the voice audio data is converted into text content, and Matching the text content with the previously sent voice password text, and if the matching is successful, extracting the voiceprint feature vector in the voice audio data, and matching it with all user voiceprint feature vectors pre-stored by the server, and matching is successful. Voiceprint recognition results.

The server may include a face recognition module, a voice recognition module, a verification module, a database, and a second communication module.

The second communication module is used for information interaction between the terminal and the server.

The face recognition module is configured to filter and denoise the face video of the detected user, extract key frames, acquire user facial feature parameters according to the key frame, and select all the key feature parameters and all stored in the database. The user facial feature parameters are matched. If the matching is successful, the matching success result is sent to the verification module, and the successful matching result is the face recognition result. If the matching fails, the terminal face recognition failure information is returned. The image recognition module may set an image similarity preset value, and when the key feature parameter in the user facial feature parameter is matched with the user facial feature parameter stored in the database, if the user facial feature parameter is matched in the result When the similarity threshold is smaller than the image similarity preset value, it is determined that the matching is successful, otherwise it is determined that the matching fails. The matching result of the face recognition module may include user information, and the user information includes user age information.

The voice recognition module is configured to: after receiving the voice recognition request sent by the verification module, send the preset voice password text to the terminal, so that the terminal displays the voice password text through the display module, and receives the voice audio sent by the voice collection module of the terminal. After the data, it is converted into text content, and the text content is matched with the previously sent voice password text. If the matching fails, the recognition is failed, and the terminal voice password input incorrect information is returned. If the matching is successful, the voice is extracted. The voiceprint feature vector in the audio data is matched with all user voiceprint feature vectors stored in the database. If the match fails, the recognition fails, and the terminal voice recognition failure information is returned. If the match is successful, the match success result is sent. To the verification module, the successful result of the matching is the voiceprint recognition result. Voice In the identification module, the preset voice password text is an easy-to-read text or a number of numbers or a piece of news text or a voice password text corresponding to the user information; in the voice recognition module, the preset voice password text is sent to the terminal. The voice password request may be judged according to the voice recognition request. If the voice password request has a request to send the voice password text, the voice password module selects the preset voice password text as the voice password text corresponding to the user information, if the voice If there is user age information in the identification request, the user's age is determined according to the user's age information. If the user is an elderly person or a minor, the preset voice password text is an easy-to-read text or a number of digits, otherwise the selected default voice password is selected. The text is a piece of news text; in addition, in the voice recognition module, after the preset voice password text is sent to the terminal, the time is also started to determine whether the terminal sends the received time within a preset time (for example, 10 seconds). Voice and audio data, if the time is up to the preset time, it has not been sent by the terminal. For voice and audio data, replace the preset voice password text and resend the replaced preset voice password text to the terminal, and restart the timing, and return to the step of determining whether to receive the voice and audio data sent by the terminal within the preset time. .

The verification module is configured to send a voice recognition request to the voice recognition module after receiving the matching success result sent by the face recognition module, and send the voice recognition request to the face recognition module after receiving the matching success result sent by the voice recognition module. If the intersection is empty, it is considered that the user verification fails, and the terminal verification failure information is returned. If there is only one result in the intersection, the verification is successful, and the terminal verification success information is returned, if the intersection has If there is more than one result, it is considered that the voiceprint feature is not obvious, and the voice recognition request is resent to the voice recognition module. If a predetermined number of voice recognition requests have been sent at this time, the current user authentication failure is considered, and the terminal verification failure information is returned. The voice recognition request sent by the verification module to the voice recognition module includes the user age information or the voice password text when requesting to send the registration, and may also be in the voice recognition request sent by the verification module to the voice recognition module, if this is the first The preset number of times (for example, when the preset number is 3, and the third time is now), the voice recognition request is sent to the voice recognition module, and the voice recognition request includes the voice password text when the registration is requested to be sent.

When used, the processing method is as follows:

Step 1. The user uses the terminal to perform user registration with the server, and the server stores user information, the user facial feature parameter, and the user voiceprint feature vector in the database.

In this step, the user information preferably includes user age information, and the step may specifically include the following steps:

Step 101: The user inputs user information to the terminal, and collects a face video or a plurality of face images through the terminal, and the terminal uploads the user information and the face video or the plurality of face images to the server.

Step 102: The server intercepts multiple face images from the face video or uses the received multiple images as face samples to obtain the facial feature parameters of the user, and performs face modeling and associates with the user information. Stored in the database and randomly generated voice password text is sent to the terminal.

Here, the randomly generated voice password text is sent to the terminal, and at least one piece of voice password text can be randomly generated and sent to the terminal in sequence, for example, three pieces of voice password text are randomly generated, randomly sorted, and then sequentially transmitted to the terminal. Among them, how many pieces of voice password text are randomly generated is determined according to the security degree of the service authentication. Generally, the service authentication with higher security requirement, the more the number of randomly generated voice password texts at the time of registration.

Step 103: The terminal displays the voice password text, and collects voice audio data of the user, and uploads the collected voice audio data to the server.

Here, the terminal displays the voice password text, and collects the user's voice and audio data, and uploads the collected voice and audio data to the server. If the terminal receives the plurality of voice password texts in sequence, the voice password text is displayed in order, when one After the voice password data is collected three times corresponding to the user's voice and audio data, the next voice password text is displayed, and each of the three voice and audio data corresponding to all the voice password texts is obtained and sent to the server. For example, when the terminal receives two pieces of voice password text in sequence, the first voice password text is displayed first, and the user voice audio data input by the user according to the first voice password text is collected three times, and then the second voice password text is displayed. The user voice audio data input by the user according to the second voice password text is collected three times, and then the three user voice audio data corresponding to the first voice password text and the three user voice audio data corresponding to the second voice password text are collected together. Sent to the server for a total of six user voice audio data.

Here, if the server receives a plurality of voice and audio data, the server respectively extracts the voiceprint feature vectors after receiving all the voice and audio data, and selects the most distinctive voiceprint feature vector for each voice password text. A voice audio data, the voice password text, the selected voice audio data and its voiceprint feature vector are associated with the information system and stored in the database. That is, one voice password text corresponds to one voice audio data, and the other two voice audio data can be deleted.

Step 2: During authentication, the terminal acquires the face video of the detected user and sends it to the server.

Step 3: The server filters and denoises the facial video of the detected user, extracts the key frame, acquires the facial feature parameters of the user according to the key frame, and selects the key feature parameters and all the user facial feature parameters stored in the database. Matching is performed. If the matching is successful, the face recognition result is obtained and the process proceeds to step 5. If the matching fails, the process proceeds to step 4.

In this step, the image similarity preset value may be set in the server, and when the key feature parameter in the user facial feature parameter is matched with the user facial feature parameter stored in the database, if the user facial features are matched in the matched result When the parameter similarity threshold is less than the image similarity preset value, it is determined that the matching is successful, otherwise the determination is Match failed. Here, the face recognition result preferably includes user information, and the user information is visible from step 1, which preferably includes user age information.

Step 4: The server returns the terminal face recognition failure information, and the terminal displays that the face recognition fails and prompts the user, and returns to step 2.

In this step, the preset voice password text may be a randomly generated piece of readable text or a randomly generated piece of numbers or a randomly generated piece of news type text or a voice password text at the time of registration corresponding to the user information.

Here, when the server generates and sends the preset voice password text to the terminal, if the user information in the face recognition result (which can be judged according to the user age information) is displayed as an elderly person or a minor, the preset voice password text is a segment. Easy-to-read text or a number of numbers, the purpose is to ensure that the user can understand and read the voice password text, otherwise the selected preset voice password text is a piece of news text, otherwise the user information display user is an adult Adults can generally understand and read the voice password text, so choose a piece of news text to increase recognition accuracy.

Step 6. The terminal displays the voice password text, and collects the voice audio data input by the user and uploads it to the server.

Step 7. The server converts the received voice audio data into text content, and matches the text content with the previously sent voice password text. If the matching fails, the identification fails, and the terminal voice password input incorrect information is returned. Go to step 8. If the match is successful, go to step 9.

Step 8. The terminal displays the voice password input incorrect information, and returns to step 2.

Step 9. The server extracts the voiceprint feature vector in the voice audio data, and matches it with all user voiceprint feature vectors stored in the database. If the match fails, the recognition fails, and the terminal voice recognition failure information is returned, and the process proceeds to the step. 10. If the matching is successful, the speech recognition result is obtained and the process proceeds to step 11.

In this step, if the matching fails, it can also be determined whether the preset number has been generated minus one (for example, the preset number is 3, then it is judged whether 2 voice password texts have been generated), if it is Then, the recognition fails, and the terminal voice recognition failure information is returned, and the process proceeds to step 10; otherwise, the preset voice password text is regenerated and sent to the terminal, and the process returns to step 6, and the preset voice password text that is regenerated and sent to the terminal is randomly generated. A piece of readable text or a randomly generated piece of numbers or a randomly generated piece of news type text having a length greater than the previously generated preset sound password text, visible, which may correspond to the generation method in step 5.

In this step, the preset value of the voiceprint similarity may also be set in the server, and if the voiceprint feature vector in the extracted voice audio data is matched with all the user voiceprint feature vectors stored in the database, if the server matches In the result, when the user user's voiceprint feature vector similarity threshold is smaller than the preset value of the voiceprint similarity, it is determined that the match is Work, otherwise it is determined that the match failed.

Step 10: The terminal displays the voice recognition failure information, and returns to step 2.

Step 11. The server performs the intersection of the face recognition result and the voice recognition result. If the intersection is empty, it is considered that the current user verification fails, and the terminal verification failure information is returned, and the process proceeds to step 12. If there is only one result in the intersection, it is considered If the verification is successful, the terminal verification success information is returned. If there is more than one result in the intersection, the voiceprint feature is considered to be inconspicuous, and it is determined whether the current authentication has sent a preset number of voice password texts. If yes, the user verification is failed. The terminal verifies the failure information, and proceeds to step 12, otherwise regenerates and sends the preset voice password text to the terminal, and returns to step 6.

In this step, the preset voice password text is regenerated and sent to the terminal, and the regenerated preset voice password text is one of the voice password texts at the time of registration corresponding to the user information, that is, random in step 102 in this example. One of the generated voice password texts, when there is only one, the voice password text is directly selected. If the random voice password text is not generated as in step 102, the user voice audio data is directly collected, and then passed. The user voice audio data is acquired to the user's voiceprint feature vector, and then the voice password text corresponding to the user voice audio data can be selected (which can be obtained by converting the user voice audio data into text data).

In this example, after the server generates and sends the preset voice password text to the terminal, the timer is also started. Here, the server may be the first time to generate and send the preset voice password text to the terminal during the current authentication, or the server may be the current time. When the authentication is re-generated and the preset voice password text is sent to the terminal, it means that the timer starts as long as the server generates and sends the preset voice password text to the terminal.

Then between step 5 and step 7, the following steps may also be included:

In this example, in step 9, if the matching fails, after returning the terminal voice recognition failure information, the server may also proceed to step 13, at which time the terminal still proceeds to step 10;

In step 11, if the verification success is successful and the terminal verification success information is returned, the server may further enter step 13. If the user authentication failure is determined and the terminal verification failure information is returned, the server may proceed to step 13. At this point, the terminal still proceeds to step 12.

Step 13 may be: the server optimizes the face modeling corresponding to the user information in the face recognition result by using the face image received in the current authentication. The purpose is: since the face recognition is successful, it indicates that the face image used for the recognition or the collected face video is correct, and the correct face image information can be used to optimize the face modeling and improve the person. Accuracy in face recognition, deletion of invalid user facial feature parameters, etc., to improve computational efficiency.

Similarly, in step 11, if the verification is successful, after returning the terminal verification success information, the server may further perform the voiceprint feature data corresponding to the user information in the face recognition result by using the voice and audio data received in the current authentication. optimization.

In this example, referring to the above processing, it is preferable that the face recognition step is prior to the front, and the voiceprint is recognized later. The reason is: First, the face recognition has been developed over the past several decades, and the technology is relatively mature and the algorithm is efficient. The processing speed is fast, and the voiceprint recognition is different from other physiological feature recognition. The voiceprint recognition feature must be a "personalized" feature, and the speaker (ie, the user who needs voiceprint recognition) needs to recognize the feature for the speaker must be There are "common characteristics". Although most of the current voiceprint recognition systems use acoustic features, the characteristics that characterize a person should be multifaceted, including: 1) acoustic features related to the anatomical structure of the human's pronunciation mechanism (eg, spectrum) , cepstrum, formant, pitch, reflection coefficient, etc.), nasal sound, deep breath sounds, hoarseness, laughter, etc.; 2) semantics, rhetoric, pronunciation, etc. affected by socioeconomic status, education level, place of birth, etc. Speech habits, etc.; 3) Personal characteristics or rhythm, rhythm, speed, intonation, volume and other characteristics affected by parents. From the point of view that mathematical methods can be used for modeling, the features currently available for the voiceprint automatic recognition model include: 1) acoustic features (cepstrum); 2) lexical features (speaker-related word n-gram, phoneme n-gram) 3) prosodic features (pitch and energy "postures" described by n-gram); 4) language, dialect and accent information; 5) channel information (what channel is used). Therefore, in the solution of the present invention, the preset voice password text may be randomly generated based on the user information. However, since the specific method of face recognition and voiceprint recognition mentioned in the present invention is a relatively mature technology, the present invention will not be described in detail.

Claims

An interactive authentication system based on face recognition and voiceprint recognition, including a terminal and a server, and the terminal and the server are connected through a network, wherein

The terminal is configured to acquire a facial video of the detected user, collect voice audio data input by the user, send the voice audio data to the server, and display display prompt information sent by the server;

The server is configured to perform matching between the facial feature parameters of the user and the user voiceprint feature vector, and perform the intersection of the voiceprint recognition result and the face recognition result. If there is only one result in the intersection, the verification is successful, and the return is successful. The terminal verifies the success information.
An interactive authentication system based on face recognition and voiceprint recognition according to claim 1, wherein:

The matching of the user facial feature parameters and the matching of the user voiceprint feature vector means that the server acquires the user facial feature parameters from the received facial video of the detected user, and acquires the obtained user facial feature parameters and all the pre-stored parameters of the server. The user facial feature parameters are matched, and the face recognition result is obtained after the matching is successful, and then the preset voice password text is sent to the terminal, and after receiving the voice audio data sent by the voice collection module of the terminal, the voice audio data is converted into text content, and Matching the text content with the previously sent voice password text, and if the matching is successful, extracting the voiceprint feature vector in the voice audio data, and matching it with all user voiceprint feature vectors pre-stored by the server, and matching is successful. Voiceprint recognition results.
The interactive authentication system based on face recognition and voiceprint recognition according to claim 2, wherein the terminal comprises a display module, a face video acquisition module, a voice collection module and a first communication module, and the server comprises a person. a face recognition module, a voice recognition module, a verification module, a database, and a second communication module, wherein the display module, the face video acquisition module, and the voice collection module are respectively connected with the first communication module, the face recognition module, the voice recognition module, and the verification The module is respectively connected with the second communication module, and the face recognition module and the voice recognition module are respectively connected with the verification module, and the database module is respectively connected with the face recognition module, the voice recognition module and the verification module, and the first communication module and the second communication module pass Internet connection,

The face video capture module is configured to acquire a facial video of the detected user and send the video to the face recognition module through the first communication module and the second communication module;

The voice collection module is configured to collect voice audio data input by the user and send the voice audio data to the voice recognition module through the first communication module and the second communication module;

The display module is configured to display display prompt information sent by the server, including a face recognition failure letter. Information, voice password input incorrect information, verification failure information, voice password text and verification success information;

The first communication module and the second communication module are used for information interaction between the terminal and the server;

The face recognition module is configured to filter and denoise the face video of the detected user, extract key frames, acquire user facial feature parameters according to the key frame, and select key feature parameters and stored in the database. All the user facial feature parameters are matched. If the matching is successful, the matching success result is sent to the verification module, and the successful matching result is the face recognition result. If the matching fails, the terminal face recognition failure information is returned;

The voice recognition module is configured to: after receiving the voice recognition request sent by the verification module, send the preset voice password text to the terminal, so that the terminal displays the voice password text through the display module, and is sent by the voice collection module of the terminal. After the voice audio data is converted into text content, and the text content is matched with the previously sent voice password text, if the matching fails, the recognition is failed, and the terminal voice password input incorrect information is returned, and if the matching is successful, the data is extracted. The voiceprint feature vector in the voice audio data is matched with all user voiceprint feature vectors stored in the database. If the match fails, the recognition fails, and the terminal voice recognition failure information is returned. If the match is successful, the match is successful. The result is sent to the verification module, and the successful result of the matching is the voiceprint recognition result;

The verification module is configured to send a voice recognition request to the voice recognition module after receiving the matching success result sent by the face recognition module, and after receiving the matching success result sent by the voice recognition module, and the face recognition module If the intersection is empty, the current user verification fails, and the terminal verification failure information is returned. If there is only one result in the intersection, the verification is successful, and the terminal verification success information is returned. If there is more than one result in the concentration, the voiceprint feature is not obvious, and the voice recognition request is resent to the voice recognition module. If a predetermined number of voice recognition requests have been sent at this time, the user authentication failure is considered, and the terminal verification failure information is returned. .
The interactive recognition system based on face recognition and voiceprint recognition according to claim 3, wherein the face recognition module is provided with an image similarity preset value, and a key in selecting a user facial feature parameter When the feature parameter is matched with the user facial feature parameter stored in the database, if the similarity threshold of each user facial feature parameter in the matched result is less than the image similarity preset value, it is determined that the matching is successful, otherwise the matching failure is determined.
The interactive recognition system based on face recognition and voiceprint recognition according to claim 3, wherein the matching result of the face recognition module includes user information, and the user information includes user age information.
The face recognition and voiceprint recognition-based interactive authentication system according to claim 5, wherein the voice recognition request sent by the verification module to the voice recognition module includes user age information or when requesting to send a registration Voice password text.
The interactive authentication system based on face recognition and voiceprint recognition according to claim 6, wherein the voice recognition request sent by the verification module to the voice recognition module is the preset number of times The voice recognition module sends a voice recognition request, and the voice recognition request includes a voice password text when requesting to send the registration.
The interactive recognition system based on face recognition and voiceprint recognition according to claim 6, wherein in the voice recognition module, the preset voice password text is an easy-to-read text or a number of digits or a piece of news text. Or the voice password text at the time of registration corresponding to the user information.
The interactive recognition system based on face recognition and voiceprint recognition according to claim 8, wherein the voice recognition module further determines the voice recognition request according to the voice recognition request before transmitting the preset voice password text to the terminal. If there is a voice password request in the voice recognition request, the preset voice password text selected by the voice recognition module is the voice password text corresponding to the user information, and if there is user age information in the voice recognition request, according to The user age information determines the age of the user. If the user is an elderly person or a minor, the preset voice password text selected is an easy-to-read text or a number of digits, otherwise the selected preset voice password text is a piece of news text.
The interactive authentication system based on face recognition and voiceprint recognition according to any one of claims 3-9, wherein the voice recognition module starts after transmitting the preset voice password text to the terminal. Timing, judging whether the voice audio data sent by the terminal is received within a preset time, if the voice audio data sent by the terminal is not received after the preset time reaches the preset time, the preset voice password text is replaced and sent to the terminal again. After the replacement of the preset voice password text, and restarting the timing, return to the step of determining whether the voice audio data sent by the terminal is received within the preset time.
An interactive authentication method based on face recognition and voiceprint recognition, which is applied to the face recognition and voiceprint recognition based interactive authentication system according to any one of claims 1 to 10, characterized in that it comprises the following steps;

Step 1. The user uses the terminal to perform user registration with the server, and the server stores the user information, the facial feature parameters of the user, and the user voiceprint feature vector in the database;

Step 2: When authenticating, the terminal acquires a facial video of the detected user and sends the video to the server;

Step 3: The server filters and denoises the facial video of the detected user, and extracts the off The key frame acquires the user facial feature parameters according to the key frame, and selects the key feature parameters to match all the user facial feature parameters stored in the database. If the matching is successful, the face recognition result is obtained and proceeds to step 5, if the matching fails. Go to step 4;

Step 4, the server returns the terminal face recognition failure information, the terminal displays the face recognition failure and prompts the user, and returns to step 2;

Step 5: The server generates and sends a preset voice password text to the terminal.

Step 6, the terminal displays the voice password text, and collects the voice audio data input by the user and uploads it to the server;

Step 7. The server converts the received voice audio data into text content, and matches the text content with the previously sent voice password text. If the matching fails, the identification fails, and the terminal voice password input incorrect information is returned. Go to step 8, if the match is successful, go to step 9;

Step 8, the terminal displays the voice password input incorrect information, return to step 2;

Step 9. The server extracts the voiceprint feature vector in the voice audio data, and matches it with all user voiceprint feature vectors stored in the database. If the match fails, the recognition fails, and the terminal voice recognition failure information is returned, and the process proceeds to the step. 10, if the match is successful, the speech recognition result is obtained and proceeds to step 11;

Step 10: The terminal displays the voice recognition failure information, and returns to step 2;

Step 11. The server performs the intersection of the face recognition result and the voice recognition result. If the intersection is empty, it is considered that the current user verification fails, and the terminal verification failure information is returned, and the process proceeds to step 12. If there is only one result in the intersection, it is considered If the verification is successful, the terminal verification success information is returned. If there is more than one result in the intersection, the voiceprint feature is considered to be inconspicuous, and it is determined whether the current authentication has sent a preset number of voice password texts. If yes, the user verification is failed. The terminal verifies the failure information, proceeds to step 12, otherwise regenerates and sends the preset voice password text to the terminal, and returns to step 6;

In step 12, the terminal displays the verification failure information, and returns to step 2.
The interactive authentication method based on face recognition and voiceprint recognition according to claim 11, wherein the step 1 comprises the following steps:

Step 101: The user inputs user information to the terminal, and collects a face video or a plurality of face images through the terminal, and the terminal uploads the user information and the face video or the plurality of face images to the server;

Step 102: The server intercepts multiple face images from the face video or uses the received multiple images as face samples to obtain the facial feature parameters of the user, and performs face modeling and is related to the user information. After being connected, the data is stored in the database, and the voice password text is randomly generated and sent to the terminal;

Step 103: The terminal displays the voice password text, and collects voice audio data of the user, and uploads the collected voice and audio data to the server;

Step 104: The server performs voiceprint feature vector extraction on the voice audio data, and associates the extracted voiceprint feature vector, voice audio data, and corresponding voice password text with the user information, and stores the data in the database.
The interactive authentication method based on face recognition and voiceprint recognition according to claim 12, wherein in step 102, the randomly generated voice password text is sent to the terminal, and at least one piece of voice password text is randomly generated, and Sent to the terminal in order;

In step 103, the terminal displays the voice password text, and collects the user's voice and audio data, and uploads the collected voice and audio data to the server, and the terminal displays the voice password text in sequence, when a voice password text is collected three times. After the user's voice and audio data, the next voice password text is displayed, and each of the three voice and audio data corresponding to all the voice password texts is obtained and sent to the server.
The interactive recognition method based on face recognition and voiceprint recognition according to claim 13, wherein in step 104, after receiving all the voice audio data, the server separately performs voiceprint feature vector extraction for each A voice password text is selected from a voice audio data in which the voiceprint feature vector is most obvious, and the voice password text, the selected voice audio data, and the voiceprint feature vector are associated with the information system and stored in the database.
The interactive authentication method based on face recognition and voiceprint recognition according to claim 14, wherein in step 11, the regenerated preset is sent to the terminal and the preset voice password text is sent to the terminal. The voice password text is one of the voice password texts at the time of registration corresponding to the user information.
The interactive authentication method based on face recognition and voiceprint recognition according to claim 11, wherein in step 3, an image similarity preset value is set in the server, and key features in the user facial feature parameter are selected. When the parameter is matched with the user facial feature parameter stored in the database, if the similarity threshold of each user facial feature parameter in the matched result is less than the image similarity preset value, it is determined that the matching is successful, otherwise the matching failure is determined.
The interactive authentication method based on face recognition and voiceprint recognition according to claim 11, wherein in step 5, the preset voice password text is a randomly generated piece of easy-to-read text or a randomly generated number of digits. Or a randomly generated piece of news text or a registration sound corresponding to the user information Tone password text.
The interactive authentication method based on face recognition and voiceprint recognition according to claim 17, wherein in step 1, the user information includes user age information;

In step 3, the face recognition result includes user information;

In step 5, when the server generates and sends a preset voice password text to the terminal, if the user information in the face recognition result is displayed as an elderly person or a minor, the preset voice password text selected is an easy-to-read text or A number of digits, otherwise the selected preset voice password text is a piece of news text.
The interactive authentication method based on face recognition and voiceprint recognition according to claim 11, wherein in step 9, if the matching fails, it is further determined whether a predetermined number of voice password texts have been generated, if Then, the recognition fails, and the terminal voice recognition failure information is returned, and the process proceeds to step 10; otherwise, the preset voice password text is regenerated and sent to the terminal, and the process returns to step 6, and the preset voice password text that is regenerated and sent to the terminal is randomly generated. An easy-to-read piece of text or a randomly generated piece of numbers or a randomly generated piece of news-type text whose length is greater than the previously generated default sound-password text.
The interactive authentication method based on face recognition and voiceprint recognition according to claim 11, wherein in step 9, the preset value of the voiceprint similarity is set in the server, in the voice audio data to be extracted by the server. When the voiceprint feature vector is matched with all the user voiceprint feature vectors stored in the database, if the user user's voiceprint feature vector similarity threshold is smaller than the voiceprint similarity preset value, the match is determined to be successful. Otherwise, it is determined that the match failed.
The method for interactive authentication based on face recognition and voiceprint recognition according to any one of claims 11 to 20, wherein in step 5, after the server generates and sends a preset voice password text to the terminal, it also starts. Timing

And/or, in step 9, after the server regenerates and sends the preset voice password text to the terminal, the timing is also started;

And/or, in step 11, after the server regenerates and sends the preset voice password text to the terminal, the timing is also started;

Between step 5 and step 7, the following steps are further included:

Step A, the server determines whether the voice audio data sent by the terminal is received within a preset time, if the voice audio data sent by the terminal is not received after the preset time reaches the preset time, the process proceeds to step A, otherwise proceeds to step 7;

Step B: The server replaces the preset voice password text and resends the replaced preset sound to the terminal. The password text is restarted, and the process returns to step A. The replaced preset voice password text is a re-randomly generated piece of easy-to-read text or a randomly generated piece of numbers or a randomly generated piece of news text.
The interactive authentication method based on face recognition and voiceprint recognition according to any one of claims 11 to 20, wherein, in step 9, if the matching fails, after returning the terminal speech recognition failure information, the server further enters the step. 13;

In step 11, if the verification is successful, returning the terminal verification success information, the server also proceeds to step 13, if it is considered that the current user verification fails, returning the terminal verification failure information, the server also proceeds to step 13;

Step 13: The server optimizes the face modeling corresponding to the user information in the face recognition result by using the face image received in the current authentication.