CN106782567B

CN106782567B - Method and device for establishing voiceprint model

Info

Publication number: CN106782567B
Application number: CN201611005290.4A
Authority: CN
Inventors: 卢道和; 陈朝亮; 杨军; 黄叶飞; 杨粟; 李晓俊; 钟伟
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2016-11-11
Filing date: 2016-11-11
Publication date: 2020-04-03
Anticipated expiration: 2036-11-11
Also published as: CN106782567A

Abstract

The invention discloses a method and a device for establishing a voiceprint model, wherein the method comprises the following steps: when a face video is obtained and a face image of the face video is successfully identified, extracting an audio file in the face video and recording the audio file as a first audio file; outputting prompt information to prompt an auditor to audit the face video; and when a notification message that the face video is approved is received, establishing a voiceprint model according to the first audio file. The invention further obtains the audio file of the user on the basis of face recognition, establishes the voiceprint model according to the obtained audio file, and confirms that the user is a real user when the face video of the user is received next time and only when the face image in the face video is successfully recognized and the audio file in the face video is matched with the established voiceprint model, thereby improving the accuracy of user recognition.

Description

Method and device for establishing voiceprint model

Technical Field

The invention relates to the technical field of identity recognition, in particular to a method and a device for establishing a voiceprint model.

Background

With the development of science and technology, many banking businesses such as bank card inquiry business, freezing business, account opening business and the like can be handled without going to a bank counter, and a user can handle various businesses directly through a telephone or on the internet. However, in the prior art, when various services are handled through the telephone or on the internet, a bank card account and a password are required to be input, and if the bank card account is input incorrectly or the password is input incorrectly, the bank card account and the password are required to be input again. And when the user inputs wrong passwords for 3 times, the bank card is locked, and the user can not handle corresponding services until the user goes to a bank counter to unlock the bank card. Therefore, existing solutions can only confirm the identity of the user through face recognition.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a method and a device for establishing a voiceprint model, and aims to solve the technical problem of improving the accuracy of user identification on the basis of face identification.

In order to achieve the above object, the present invention provides a method for creating a voiceprint model, wherein the method for creating a voiceprint model includes:

when a face video is obtained and a face image of the face video is successfully identified, extracting an audio file in the face video and recording the audio file as a first audio file;

outputting prompt information to prompt an auditor to audit the face video;

and when a notification message that the face video is approved is received, establishing a voiceprint model according to the first audio file.

Preferably, when receiving a notification message that the face video audit is passed, the step of establishing a voiceprint model according to the first audio file includes:

when a notification message that the face video is approved is received, judging whether a voiceprint model exists or not;

if the voiceprint model does not exist, establishing the voiceprint model according to the first audio file;

if the voiceprint model exists, deleting the existing voiceprint model, and extracting a stored second audio file, wherein the second audio file is an audio file which is successfully registered;

and establishing a voiceprint model according to the first audio file and the second audio file.

Preferably, the step of extracting the stored second audio file comprises:

judging whether a preset number of second audio files are stored or not;

if the preset number of second audio files are stored, the step of establishing a voiceprint model according to the first audio file and the second audio file comprises the following steps:

and establishing a voiceprint model according to the second audio files and the first audio files which are stored recently in a preset number.

Preferably, after the step of determining whether a preset number of second audio files are stored, the method further includes:

if the preset number of second audio files are not stored, acquiring all the stored second audio files;

the step of establishing a voiceprint model from the first audio file and the second audio file comprises:

and establishing a voiceprint model according to all the obtained second audio files and the first audio files.

Preferably, after the step of extracting an audio file in the face video and recording as the first audio file when the face video is acquired and the face image of the face video is successfully identified, the method further includes:

judging whether a voiceprint model exists or not;

if the voiceprint model does not exist, outputting prompt information to prompt an auditor to audit the face video;

if the voiceprint model exists, extracting an audio file corresponding to the voiceprint model, and recording the audio file as a third audio file;

comparing the first audio file with the third audio file to obtain the similarity between the first audio file and the third audio file;

and sending the similarity between the first audio file and the third audio file to an asynchronous auditing system, and outputting prompting information to prompt an auditor to audit the face video.

In addition, to achieve the above object, the present invention further provides an apparatus for creating a voiceprint model, including:

the extraction module is used for extracting an audio file in the face video and recording the audio file as a first audio file when the face video is obtained and a face image of the face video is successfully identified;

the output module is used for outputting prompt information to prompt an auditor to audit the face video;

and the establishing module is used for establishing a voiceprint model according to the first audio file when the notification message that the face video is approved is received.

Preferably, the establishing module comprises:

the judging unit is used for judging whether a voiceprint model exists or not when the notification message that the face video is approved is received;

the establishing unit is used for establishing a voiceprint model according to the first audio file if the voiceprint model does not exist;

the extracting unit is used for deleting the existing voiceprint model and extracting a stored second audio file if the voiceprint model exists, wherein the second audio file is an audio file which is successfully registered;

the establishing unit is further used for establishing a voiceprint model according to the first audio file and the second audio file.

Preferably, the judging unit is further configured to judge whether a preset number of second audio files are stored;

the establishing unit is further configured to establish a voiceprint model according to the second audio files and the first audio files which are stored recently in a preset number if the second audio files in the preset number are stored.

Preferably, the establishing module further comprises:

the acquisition unit is used for acquiring all the stored second audio files if the preset number of second audio files are not stored;

the establishing unit is further used for establishing a voiceprint model according to all the obtained second audio files and the first audio files.

Preferably, the apparatus for establishing a voiceprint model further comprises:

the judging module is used for judging whether the voiceprint model exists or not;

the output module is also used for outputting prompt information to prompt an auditor to audit the face video if the voiceprint model does not exist;

the extracting module is further used for extracting an audio file corresponding to the voiceprint model if the voiceprint model exists and recording the audio file as a third audio file;

the apparatus for establishing the voiceprint model further comprises:

the comparison module is used for comparing the first audio file with the third audio file to obtain the similarity between the first audio file and the third audio file;

and the sending module is used for sending the similarity between the first audio file and the third audio file to an asynchronous auditing system.

According to the method, when a face video is obtained and a face image of the face video is successfully identified, an audio file in the face video is extracted and recorded as a first audio file; outputting prompt information to prompt an auditor to audit the face video; and when a notification message that the face video is approved is received, establishing a voiceprint model according to the first audio file. The method and the device realize that the audio file of the user is further acquired on the basis of face recognition, the voiceprint model is established according to the acquired audio file, and when the face video of the user is received next time, the user is confirmed to be a real user only when the face image in the face video is successfully recognized and the audio file in the face video is matched with the established voiceprint model, so that the accuracy of user recognition is improved.

Drawings

FIG. 1 is a schematic flow chart illustrating a first embodiment of a method for creating a voiceprint model according to the present invention;

FIG. 2 is a flowchart illustrating a method for creating a voiceprint model according to a second embodiment of the present invention;

FIG. 3 is a functional block diagram of a first embodiment of the apparatus for creating a voiceprint model according to the present invention;

FIG. 4 is a functional block diagram of an apparatus for creating a voiceprint model according to a second embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, fig. 1 is a schematic flow chart of a method for establishing a voiceprint model according to a first embodiment of the present invention.

In this embodiment, the method for establishing the voiceprint model includes:

step S10, when a face video is obtained and a face image of the face video is successfully identified, extracting an audio file in the face video and recording the audio file as a first audio file;

when a user needs to transact banking business through a telephone or the Internet, a server of a bank prompts a mobile terminal held by the user to call a camera to acquire a face video of the user, wherein the face video comprises a face image and an audio file of the user. It should be noted that, the method for the server to obtain the face video may be: in the process of extracting the face image of the user, displaying corresponding numbers or characters on a screen of the mobile terminal, and enabling the user to read out the displayed numbers or characters within a certain time; or in the process of extracting the face image of the user, prompting information is output in a screen of the mobile terminal to prompt the user to read out a preset number of words within a certain time. The mobile terminal includes but is not limited to a smart phone and a tablet computer.

When the face video is acquired, the server extracts a face image in the face video, and compares the extracted face image with a face image which is stored in advance for the user, wherein the face image which is stored in advance for the user is recorded as a prestored face image. When the similarity between the face image and a prestored face image is greater than or equal to a preset similarity, the server confirms that the face image is successfully identified; and when the similarity between the face image and a pre-stored face image is smaller than the preset similarity, the server confirms that the face image identification fails. The preset similarity can be set according to specific needs, such as 60%, 70%, or 80%.

And when the face image is successfully identified, the server extracts the audio file in the face video and records the audio file extracted from the face video as a first audio file.

Step S20, outputting prompt information to prompt an auditor to audit the face video;

and when the first audio file is obtained, the server outputs prompt information to an asynchronous auditing system so as to prompt an asynchronous auditing worker to audit the authenticity of the face video. It should be noted that, when the auditing worker is in the process of auditing the authenticity of the face video, the auditing worker may compare the face image in the face video with a face image stored in advance. The face image stored in advance may be one or more than one. When the auditing staff confirms that the face image in the face video is real and is the user, the auditing staff returns an auditing passing notification message to the server through the asynchronous auditing system; and when the auditing staff confirms that the face image in the face video is not the user, the auditing staff returns an auditing failure notification message to the server through the asynchronous auditing system.

And when the server receives the notification message sent by the asynchronous auditing system and determines that the face video auditing is failed according to the notification message, the server finishes the establishing process of the voiceprint model.

In this embodiment, the server first extracts an audio file from the face video, and then outputs the prompt information. In other embodiments, the server may also output prompt information first, and when the face video is approved, the server extracts a face image from the face video.

And step S30, when the notification message that the face video is approved is received, establishing a voiceprint model according to the first audio file.

And when the server receives a notification message that the face video is approved and sent by the asynchronous auditing system, the server establishes a voiceprint model according to a first audio file extracted from the face video.

Further, the step S30 includes:

step a, when a notification message that the face video is approved is received, judging whether a voiceprint model exists or not;

b, if the voiceprint model does not exist, establishing the voiceprint model according to the first audio file;

step c, if the voiceprint model exists, deleting the existing voiceprint model, and extracting a stored second audio file, wherein the second audio file is an audio file which is successfully registered;

and d, establishing a voiceprint model according to the first audio file and the second audio file.

Further, when the server receives the notification message that the face video is approved, the server judges whether the voiceprint model exists in the database. And when the database does not have the voiceprint model, the server establishes the voiceprint model according to the first audio file. And when the voiceprint model exists in the database, the server deletes the voiceprint model in the database. And after the server deletes the voiceprint model in the database, the server extracts a stored second audio file from the database, wherein the second audio file is an audio file successfully registered in the database. It should be noted that the audio file that is successfully registered is an audio file for which a voiceprint model has already been established, that is, the audio file that is successfully registered is an audio file corresponding to the deleted historical voiceprint model. And when the server obtains the second audio file, the server superposes the first audio file and the second audio file to obtain a voiceprint model. And superposing the first audio file and the second audio to obtain a voiceprint model, and optimizing the voiceprint model in the server to enable the established voiceprint model to better accord with the sound characteristics of the user.

Further, the step of extracting the stored second audio file comprises:

step e, judging whether a preset number of second audio files are stored;

if the preset number of second audio files are stored, the step d includes:

and f, establishing a voiceprint model according to the second audio files and the first audio files which are stored recently in a preset number.

Further, in the process of extracting the stored second audio files, the server determines whether a preset number of second audio files are stored in the database. The preset number can be set according to specific needs, such as 3, 5 or 6. And when the preset number of second audio files are stored in the database, the server superposes the second audio files which are stored recently and have the preset number with the first audio files, and a voiceprint model is established. If the preset number is set to 5 and at least 5 second audio files are stored in the data, the server extracts the second audio files and the first audio files which are stored for the last 5 times from the current time to stack, and establishes the voiceprint model.

Further, the method for establishing the voiceprint model further comprises the step of

Step g, if the preset number of second audio files are not stored, acquiring all the stored second audio files;

the step d comprises the following steps:

and h, establishing a voiceprint model according to all the obtained second audio files and the first audio files.

When the preset number of second audio files are not stored in the database, the server acquires all the second audio files stored in the database, and overlaps all the acquired second audio files with the first audio files to establish a voiceprint model. And if only three second audio files are stored in the database, the server superposes the three second audio files and the first audio file to suggest a voiceprint model.

In the embodiment, when a face video is acquired and a face image of the face video is successfully identified, an audio file in the face video is extracted and recorded as a first audio file; outputting prompt information to prompt an auditor to audit the face video; and when a notification message that the face video is approved is received, establishing a voiceprint model according to the first audio file. The method and the device realize that the audio file of the user is further acquired on the basis of face recognition, the voiceprint model is established according to the acquired audio file, and when the face video of the user is received next time, the user is confirmed to be a real user only when the face image in the face video is successfully recognized and the audio file in the face video is matched with the established voiceprint model, so that the accuracy of user recognition is improved.

Further, referring to fig. 2, fig. 2 is a schematic flowchart of a second embodiment of the method for establishing a voiceprint model according to the present invention, and the second embodiment of the method for establishing a voiceprint model according to the present invention is provided based on the first embodiment.

In this embodiment, the method for establishing a voiceprint model further includes:

step S40, judging whether a voiceprint model already exists;

if no voiceprint model exists, go to step S20;

step S50, if a voiceprint model exists, extracting an audio file corresponding to the voiceprint model, and recording the audio file as a third audio file;

step S60, comparing the first audio file with the third audio file to obtain a similarity between the first audio file and the third audio file;

and step S70, sending the similarity between the first audio file and the third audio file to an asynchronous auditing system.

In the present embodiment, when step S70 is completed, step S20 is performed.

When the server extracts the face image from the face video, the server judges whether a voiceprint model exists in the database. And when the voiceprint model does not exist in the database, the server outputs prompt information to an asynchronous auditing system so that the asynchronous auditing system prompts an auditor to audit the face video. It can be understood that when the voiceprint model does not exist in the database, it indicates that the server acquires the face video of the user for the first time. It should be noted that the server and the asynchronous auditing system may be located in one computer or in two computers.

And when the database has the voiceprint model, the server extracts the audio file corresponding to the voiceprint model, namely extracts the audio file for establishing the voiceprint model, and records the audio file as a third audio file. And when the third audio file is obtained, the server compares the first audio file with the third audio file to obtain the similarity between the first audio file and the third audio file. Sending the similarity between the first audio file and the third audio file to an asynchronous auditing system, and outputting prompting information to the asynchronous auditing system by the server so that an auditing staff is prompted by the asynchronous auditing system to audit the face video; and when the asynchronous audit result passes, the server establishes the voiceprint model, and when the asynchronous audit result does not pass, the server ends the process of establishing the voiceprint model. The preset threshold may be set according to specific needs, such as 60%, 70%, or 85%.

In this embodiment, when the first audio file in the face video is extracted and the voiceprint model exists in the database of the server, the third audio file corresponding to the voiceprint model is extracted, the third audio file is compared with the first audio file, and subsequent operations are performed according to the comparison result. The accuracy of the established voiceprint model is improved, and the established voiceprint model is more in line with the real voice characteristics of the user.

The invention further provides a device for establishing the voiceprint model.

Referring to fig. 3, fig. 3 is a functional block diagram of a first embodiment of the apparatus for creating a voiceprint model according to the present invention.

In this embodiment, the apparatus for creating a voiceprint model includes:

the extraction module 10 is configured to extract an audio file in a face video when the face video is acquired and a face image of the face video is successfully identified, and record the audio file as a first audio file;

The output module 20 is configured to output prompt information to prompt an auditor to audit the face video;

And the establishing module 30 is configured to establish a voiceprint model according to the first audio file when receiving the notification message that the face video is approved.

Further, the establishing module 30 includes:

Further, the judging unit is further configured to judge whether a preset number of second audio files are stored;

Further, the establishing module 30 further includes:

Referring to fig. 4, fig. 4 is a functional block diagram of a second embodiment of the apparatus for building a voiceprint model according to the present invention, and the second embodiment of the apparatus for building a voiceprint model according to the present invention is provided based on the first embodiment.

In this embodiment, the apparatus for creating a voiceprint model further includes:

a judging module 40, configured to judge whether a voiceprint model already exists;

the output module 20 is further configured to output prompt information to prompt an auditor to audit the face video if the voiceprint model does not exist;

the extraction module 10 is further configured to extract an audio file corresponding to the voiceprint model if the voiceprint model exists, and record the audio file as a third audio file;

the apparatus for establishing the voiceprint model further comprises:

a comparison module 50, configured to compare the first audio file with the third audio file to obtain a similarity between the first audio file and the third audio file;

a sending module 60, configured to send the similarity between the first audio file and the third audio file to an asynchronous auditing system.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method of creating a voiceprint model, the method comprising:

outputting prompt information to prompt an auditor to audit the face video;

if the voiceprint model exists, deleting the existing voiceprint model, and extracting a stored second audio file, wherein the second audio file is an audio file which is successfully registered, and the audio file which is successfully registered is an audio file of which the voiceprint model is established;

and superposing the first audio file and the second audio file to obtain a voiceprint model.

2. The method of creating a voiceprint model according to claim 1 wherein said step of extracting said stored second audio file comprises:

judging whether a preset number of second audio files are stored or not;

3. The method of claim 2, wherein the step of determining whether a predetermined number of second audio files are stored further comprises:

4. The method according to any one of claims 1 to 3, wherein after the step of extracting an audio file in the face video and recording as the first audio file when the face video is acquired and the face image of the face video is successfully recognized, the method further comprises:

judging whether a voiceprint model exists or not;

5. An apparatus for creating a voiceprint model, said apparatus for creating a voiceprint model comprising:

the establishing module is used for establishing a voiceprint model according to the first audio file when the notification message that the face video is approved is received;

the establishing module comprises:

the extracting unit is used for deleting the existing voiceprint model and extracting the stored second audio file if the voiceprint model exists, wherein the second audio file is an audio file which is successfully registered, and the audio file which is successfully registered is an audio file of which the voiceprint model is established;

the establishing unit is further configured to superimpose the first audio file and the second audio file to obtain a voiceprint model.

6. The apparatus for creating a voiceprint model according to claim 5, wherein said determining unit is further configured to determine whether a preset number of said second audio files are stored;

7. The apparatus for modeling a voiceprint of claim 6 wherein said building module further comprises:

8. The apparatus for modeling a voiceprint according to any one of claims 5 to 7, wherein said apparatus for modeling a voiceprint further comprises:

the apparatus for establishing the voiceprint model further comprises: