CN106782567B - Method and device for establishing voiceprint model - Google Patents

Method and device for establishing voiceprint model Download PDF

Info

Publication number
CN106782567B
CN106782567B CN201611005290.4A CN201611005290A CN106782567B CN 106782567 B CN106782567 B CN 106782567B CN 201611005290 A CN201611005290 A CN 201611005290A CN 106782567 B CN106782567 B CN 106782567B
Authority
CN
China
Prior art keywords
audio file
voiceprint model
audio
face video
establishing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611005290.4A
Other languages
Chinese (zh)
Other versions
CN106782567A (en
Inventor
卢道和
陈朝亮
杨军
黄叶飞
杨粟
李晓俊
钟伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN201611005290.4A priority Critical patent/CN106782567B/en
Publication of CN106782567A publication Critical patent/CN106782567A/en
Application granted granted Critical
Publication of CN106782567B publication Critical patent/CN106782567B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The invention discloses a method and a device for establishing a voiceprint model, wherein the method comprises the following steps: when a face video is obtained and a face image of the face video is successfully identified, extracting an audio file in the face video and recording the audio file as a first audio file; outputting prompt information to prompt an auditor to audit the face video; and when a notification message that the face video is approved is received, establishing a voiceprint model according to the first audio file. The invention further obtains the audio file of the user on the basis of face recognition, establishes the voiceprint model according to the obtained audio file, and confirms that the user is a real user when the face video of the user is received next time and only when the face image in the face video is successfully recognized and the audio file in the face video is matched with the established voiceprint model, thereby improving the accuracy of user recognition.

Description

Method and device for establishing voiceprint model
Technical Field
The invention relates to the technical field of identity recognition, in particular to a method and a device for establishing a voiceprint model.
Background
With the development of science and technology, many banking businesses such as bank card inquiry business, freezing business, account opening business and the like can be handled without going to a bank counter, and a user can handle various businesses directly through a telephone or on the internet. However, in the prior art, when various services are handled through the telephone or on the internet, a bank card account and a password are required to be input, and if the bank card account is input incorrectly or the password is input incorrectly, the bank card account and the password are required to be input again. And when the user inputs wrong passwords for 3 times, the bank card is locked, and the user can not handle corresponding services until the user goes to a bank counter to unlock the bank card. Therefore, existing solutions can only confirm the identity of the user through face recognition.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide a method and a device for establishing a voiceprint model, and aims to solve the technical problem of improving the accuracy of user identification on the basis of face identification.
In order to achieve the above object, the present invention provides a method for creating a voiceprint model, wherein the method for creating a voiceprint model includes:
when a face video is obtained and a face image of the face video is successfully identified, extracting an audio file in the face video and recording the audio file as a first audio file;
outputting prompt information to prompt an auditor to audit the face video;
and when a notification message that the face video is approved is received, establishing a voiceprint model according to the first audio file.
Preferably, when receiving a notification message that the face video audit is passed, the step of establishing a voiceprint model according to the first audio file includes:
when a notification message that the face video is approved is received, judging whether a voiceprint model exists or not;
if the voiceprint model does not exist, establishing the voiceprint model according to the first audio file;
if the voiceprint model exists, deleting the existing voiceprint model, and extracting a stored second audio file, wherein the second audio file is an audio file which is successfully registered;
and establishing a voiceprint model according to the first audio file and the second audio file.
Preferably, the step of extracting the stored second audio file comprises:
judging whether a preset number of second audio files are stored or not;
if the preset number of second audio files are stored, the step of establishing a voiceprint model according to the first audio file and the second audio file comprises the following steps:
and establishing a voiceprint model according to the second audio files and the first audio files which are stored recently in a preset number.
Preferably, after the step of determining whether a preset number of second audio files are stored, the method further includes:
if the preset number of second audio files are not stored, acquiring all the stored second audio files;
the step of establishing a voiceprint model from the first audio file and the second audio file comprises:
and establishing a voiceprint model according to all the obtained second audio files and the first audio files.
Preferably, after the step of extracting an audio file in the face video and recording as the first audio file when the face video is acquired and the face image of the face video is successfully identified, the method further includes:
judging whether a voiceprint model exists or not;
if the voiceprint model does not exist, outputting prompt information to prompt an auditor to audit the face video;
if the voiceprint model exists, extracting an audio file corresponding to the voiceprint model, and recording the audio file as a third audio file;
comparing the first audio file with the third audio file to obtain the similarity between the first audio file and the third audio file;
and sending the similarity between the first audio file and the third audio file to an asynchronous auditing system, and outputting prompting information to prompt an auditor to audit the face video.
In addition, to achieve the above object, the present invention further provides an apparatus for creating a voiceprint model, including:
the extraction module is used for extracting an audio file in the face video and recording the audio file as a first audio file when the face video is obtained and a face image of the face video is successfully identified;
the output module is used for outputting prompt information to prompt an auditor to audit the face video;
and the establishing module is used for establishing a voiceprint model according to the first audio file when the notification message that the face video is approved is received.
Preferably, the establishing module comprises:
the judging unit is used for judging whether a voiceprint model exists or not when the notification message that the face video is approved is received;
the establishing unit is used for establishing a voiceprint model according to the first audio file if the voiceprint model does not exist;
the extracting unit is used for deleting the existing voiceprint model and extracting a stored second audio file if the voiceprint model exists, wherein the second audio file is an audio file which is successfully registered;
the establishing unit is further used for establishing a voiceprint model according to the first audio file and the second audio file.
Preferably, the judging unit is further configured to judge whether a preset number of second audio files are stored;
the establishing unit is further configured to establish a voiceprint model according to the second audio files and the first audio files which are stored recently in a preset number if the second audio files in the preset number are stored.
Preferably, the establishing module further comprises:
the acquisition unit is used for acquiring all the stored second audio files if the preset number of second audio files are not stored;
the establishing unit is further used for establishing a voiceprint model according to all the obtained second audio files and the first audio files.
Preferably, the apparatus for establishing a voiceprint model further comprises:
the judging module is used for judging whether the voiceprint model exists or not;
the output module is also used for outputting prompt information to prompt an auditor to audit the face video if the voiceprint model does not exist;
the extracting module is further used for extracting an audio file corresponding to the voiceprint model if the voiceprint model exists and recording the audio file as a third audio file;
the apparatus for establishing the voiceprint model further comprises:
the comparison module is used for comparing the first audio file with the third audio file to obtain the similarity between the first audio file and the third audio file;
and the sending module is used for sending the similarity between the first audio file and the third audio file to an asynchronous auditing system.
According to the method, when a face video is obtained and a face image of the face video is successfully identified, an audio file in the face video is extracted and recorded as a first audio file; outputting prompt information to prompt an auditor to audit the face video; and when a notification message that the face video is approved is received, establishing a voiceprint model according to the first audio file. The method and the device realize that the audio file of the user is further acquired on the basis of face recognition, the voiceprint model is established according to the acquired audio file, and when the face video of the user is received next time, the user is confirmed to be a real user only when the face image in the face video is successfully recognized and the audio file in the face video is matched with the established voiceprint model, so that the accuracy of user recognition is improved.
Drawings
FIG. 1 is a schematic flow chart illustrating a first embodiment of a method for creating a voiceprint model according to the present invention;
FIG. 2 is a flowchart illustrating a method for creating a voiceprint model according to a second embodiment of the present invention;
FIG. 3 is a functional block diagram of a first embodiment of the apparatus for creating a voiceprint model according to the present invention;
FIG. 4 is a functional block diagram of an apparatus for creating a voiceprint model according to a second embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, fig. 1 is a schematic flow chart of a method for establishing a voiceprint model according to a first embodiment of the present invention.
In this embodiment, the method for establishing the voiceprint model includes:
step S10, when a face video is obtained and a face image of the face video is successfully identified, extracting an audio file in the face video and recording the audio file as a first audio file;
when a user needs to transact banking business through a telephone or the Internet, a server of a bank prompts a mobile terminal held by the user to call a camera to acquire a face video of the user, wherein the face video comprises a face image and an audio file of the user. It should be noted that, the method for the server to obtain the face video may be: in the process of extracting the face image of the user, displaying corresponding numbers or characters on a screen of the mobile terminal, and enabling the user to read out the displayed numbers or characters within a certain time; or in the process of extracting the face image of the user, prompting information is output in a screen of the mobile terminal to prompt the user to read out a preset number of words within a certain time. The mobile terminal includes but is not limited to a smart phone and a tablet computer.
When the face video is acquired, the server extracts a face image in the face video, and compares the extracted face image with a face image which is stored in advance for the user, wherein the face image which is stored in advance for the user is recorded as a prestored face image. When the similarity between the face image and a prestored face image is greater than or equal to a preset similarity, the server confirms that the face image is successfully identified; and when the similarity between the face image and a pre-stored face image is smaller than the preset similarity, the server confirms that the face image identification fails. The preset similarity can be set according to specific needs, such as 60%, 70%, or 80%.
And when the face image is successfully identified, the server extracts the audio file in the face video and records the audio file extracted from the face video as a first audio file.
Step S20, outputting prompt information to prompt an auditor to audit the face video;
and when the first audio file is obtained, the server outputs prompt information to an asynchronous auditing system so as to prompt an asynchronous auditing worker to audit the authenticity of the face video. It should be noted that, when the auditing worker is in the process of auditing the authenticity of the face video, the auditing worker may compare the face image in the face video with a face image stored in advance. The face image stored in advance may be one or more than one. When the auditing staff confirms that the face image in the face video is real and is the user, the auditing staff returns an auditing passing notification message to the server through the asynchronous auditing system; and when the auditing staff confirms that the face image in the face video is not the user, the auditing staff returns an auditing failure notification message to the server through the asynchronous auditing system.
And when the server receives the notification message sent by the asynchronous auditing system and determines that the face video auditing is failed according to the notification message, the server finishes the establishing process of the voiceprint model.
In this embodiment, the server first extracts an audio file from the face video, and then outputs the prompt information. In other embodiments, the server may also output prompt information first, and when the face video is approved, the server extracts a face image from the face video.
And step S30, when the notification message that the face video is approved is received, establishing a voiceprint model according to the first audio file.
And when the server receives a notification message that the face video is approved and sent by the asynchronous auditing system, the server establishes a voiceprint model according to a first audio file extracted from the face video.
Further, the step S30 includes:
step a, when a notification message that the face video is approved is received, judging whether a voiceprint model exists or not;
b, if the voiceprint model does not exist, establishing the voiceprint model according to the first audio file;
step c, if the voiceprint model exists, deleting the existing voiceprint model, and extracting a stored second audio file, wherein the second audio file is an audio file which is successfully registered;
and d, establishing a voiceprint model according to the first audio file and the second audio file.
Further, when the server receives the notification message that the face video is approved, the server judges whether the voiceprint model exists in the database. And when the database does not have the voiceprint model, the server establishes the voiceprint model according to the first audio file. And when the voiceprint model exists in the database, the server deletes the voiceprint model in the database. And after the server deletes the voiceprint model in the database, the server extracts a stored second audio file from the database, wherein the second audio file is an audio file successfully registered in the database. It should be noted that the audio file that is successfully registered is an audio file for which a voiceprint model has already been established, that is, the audio file that is successfully registered is an audio file corresponding to the deleted historical voiceprint model. And when the server obtains the second audio file, the server superposes the first audio file and the second audio file to obtain a voiceprint model. And superposing the first audio file and the second audio to obtain a voiceprint model, and optimizing the voiceprint model in the server to enable the established voiceprint model to better accord with the sound characteristics of the user.
Further, the step of extracting the stored second audio file comprises:
step e, judging whether a preset number of second audio files are stored;
if the preset number of second audio files are stored, the step d includes:
and f, establishing a voiceprint model according to the second audio files and the first audio files which are stored recently in a preset number.
Further, in the process of extracting the stored second audio files, the server determines whether a preset number of second audio files are stored in the database. The preset number can be set according to specific needs, such as 3, 5 or 6. And when the preset number of second audio files are stored in the database, the server superposes the second audio files which are stored recently and have the preset number with the first audio files, and a voiceprint model is established. If the preset number is set to 5 and at least 5 second audio files are stored in the data, the server extracts the second audio files and the first audio files which are stored for the last 5 times from the current time to stack, and establishes the voiceprint model.
Further, the method for establishing the voiceprint model further comprises the step of
Step g, if the preset number of second audio files are not stored, acquiring all the stored second audio files;
the step d comprises the following steps:
and h, establishing a voiceprint model according to all the obtained second audio files and the first audio files.
When the preset number of second audio files are not stored in the database, the server acquires all the second audio files stored in the database, and overlaps all the acquired second audio files with the first audio files to establish a voiceprint model. And if only three second audio files are stored in the database, the server superposes the three second audio files and the first audio file to suggest a voiceprint model.
In the embodiment, when a face video is acquired and a face image of the face video is successfully identified, an audio file in the face video is extracted and recorded as a first audio file; outputting prompt information to prompt an auditor to audit the face video; and when a notification message that the face video is approved is received, establishing a voiceprint model according to the first audio file. The method and the device realize that the audio file of the user is further acquired on the basis of face recognition, the voiceprint model is established according to the acquired audio file, and when the face video of the user is received next time, the user is confirmed to be a real user only when the face image in the face video is successfully recognized and the audio file in the face video is matched with the established voiceprint model, so that the accuracy of user recognition is improved.
Further, referring to fig. 2, fig. 2 is a schematic flowchart of a second embodiment of the method for establishing a voiceprint model according to the present invention, and the second embodiment of the method for establishing a voiceprint model according to the present invention is provided based on the first embodiment.
In this embodiment, the method for establishing a voiceprint model further includes:
step S40, judging whether a voiceprint model already exists;
if no voiceprint model exists, go to step S20;
step S50, if a voiceprint model exists, extracting an audio file corresponding to the voiceprint model, and recording the audio file as a third audio file;
step S60, comparing the first audio file with the third audio file to obtain a similarity between the first audio file and the third audio file;
and step S70, sending the similarity between the first audio file and the third audio file to an asynchronous auditing system.
In the present embodiment, when step S70 is completed, step S20 is performed.
When the server extracts the face image from the face video, the server judges whether a voiceprint model exists in the database. And when the voiceprint model does not exist in the database, the server outputs prompt information to an asynchronous auditing system so that the asynchronous auditing system prompts an auditor to audit the face video. It can be understood that when the voiceprint model does not exist in the database, it indicates that the server acquires the face video of the user for the first time. It should be noted that the server and the asynchronous auditing system may be located in one computer or in two computers.
And when the database has the voiceprint model, the server extracts the audio file corresponding to the voiceprint model, namely extracts the audio file for establishing the voiceprint model, and records the audio file as a third audio file. And when the third audio file is obtained, the server compares the first audio file with the third audio file to obtain the similarity between the first audio file and the third audio file. Sending the similarity between the first audio file and the third audio file to an asynchronous auditing system, and outputting prompting information to the asynchronous auditing system by the server so that an auditing staff is prompted by the asynchronous auditing system to audit the face video; and when the asynchronous audit result passes, the server establishes the voiceprint model, and when the asynchronous audit result does not pass, the server ends the process of establishing the voiceprint model. The preset threshold may be set according to specific needs, such as 60%, 70%, or 85%.
In this embodiment, when the first audio file in the face video is extracted and the voiceprint model exists in the database of the server, the third audio file corresponding to the voiceprint model is extracted, the third audio file is compared with the first audio file, and subsequent operations are performed according to the comparison result. The accuracy of the established voiceprint model is improved, and the established voiceprint model is more in line with the real voice characteristics of the user.
The invention further provides a device for establishing the voiceprint model.
Referring to fig. 3, fig. 3 is a functional block diagram of a first embodiment of the apparatus for creating a voiceprint model according to the present invention.
In this embodiment, the apparatus for creating a voiceprint model includes:
the extraction module 10 is configured to extract an audio file in a face video when the face video is acquired and a face image of the face video is successfully identified, and record the audio file as a first audio file;
when a user needs to transact banking business through a telephone or the Internet, a server of a bank prompts a mobile terminal held by the user to call a camera to acquire a face video of the user, wherein the face video comprises a face image and an audio file of the user. It should be noted that, the method for the server to obtain the face video may be: in the process of extracting the face image of the user, displaying corresponding numbers or characters on a screen of the mobile terminal, and enabling the user to read out the displayed numbers or characters within a certain time; or in the process of extracting the face image of the user, prompting information is output in a screen of the mobile terminal to prompt the user to read out a preset number of words within a certain time. The mobile terminal includes but is not limited to a smart phone and a tablet computer.
When the face video is acquired, the server extracts a face image in the face video, and compares the extracted face image with a face image which is stored in advance for the user, wherein the face image which is stored in advance for the user is recorded as a prestored face image. When the similarity between the face image and a prestored face image is greater than or equal to a preset similarity, the server confirms that the face image is successfully identified; and when the similarity between the face image and a pre-stored face image is smaller than the preset similarity, the server confirms that the face image identification fails. The preset similarity can be set according to specific needs, such as 60%, 70%, or 80%.
And when the face image is successfully identified, the server extracts the audio file in the face video and records the audio file extracted from the face video as a first audio file.
The output module 20 is configured to output prompt information to prompt an auditor to audit the face video;
and when the first audio file is obtained, the server outputs prompt information to an asynchronous auditing system so as to prompt an asynchronous auditing worker to audit the authenticity of the face video. It should be noted that, when the auditing worker is in the process of auditing the authenticity of the face video, the auditing worker may compare the face image in the face video with a face image stored in advance. The face image stored in advance may be one or more than one. When the auditing staff confirms that the face image in the face video is real and is the user, the auditing staff returns an auditing passing notification message to the server through the asynchronous auditing system; and when the auditing staff confirms that the face image in the face video is not the user, the auditing staff returns an auditing failure notification message to the server through the asynchronous auditing system.
And when the server receives the notification message sent by the asynchronous auditing system and determines that the face video auditing is failed according to the notification message, the server finishes the establishing process of the voiceprint model.
In this embodiment, the server first extracts an audio file from the face video, and then outputs the prompt information. In other embodiments, the server may also output prompt information first, and when the face video is approved, the server extracts a face image from the face video.
And the establishing module 30 is configured to establish a voiceprint model according to the first audio file when receiving the notification message that the face video is approved.
And when the server receives a notification message that the face video is approved and sent by the asynchronous auditing system, the server establishes a voiceprint model according to a first audio file extracted from the face video.
Further, the establishing module 30 includes:
the judging unit is used for judging whether a voiceprint model exists or not when the notification message that the face video is approved is received;
the establishing unit is used for establishing a voiceprint model according to the first audio file if the voiceprint model does not exist;
the extracting unit is used for deleting the existing voiceprint model and extracting a stored second audio file if the voiceprint model exists, wherein the second audio file is an audio file which is successfully registered;
the establishing unit is further used for establishing a voiceprint model according to the first audio file and the second audio file.
Further, when the server receives the notification message that the face video is approved, the server judges whether the voiceprint model exists in the database. And when the database does not have the voiceprint model, the server establishes the voiceprint model according to the first audio file. And when the voiceprint model exists in the database, the server deletes the voiceprint model in the database. And after the server deletes the voiceprint model in the database, the server extracts a stored second audio file from the database, wherein the second audio file is an audio file successfully registered in the database. It should be noted that the audio file that is successfully registered is an audio file for which a voiceprint model has already been established, that is, the audio file that is successfully registered is an audio file corresponding to the deleted historical voiceprint model. And when the server obtains the second audio file, the server superposes the first audio file and the second audio file to obtain a voiceprint model. And superposing the first audio file and the second audio to obtain a voiceprint model, and optimizing the voiceprint model in the server to enable the established voiceprint model to better accord with the sound characteristics of the user.
Further, the judging unit is further configured to judge whether a preset number of second audio files are stored;
the establishing unit is further configured to establish a voiceprint model according to the second audio files and the first audio files which are stored recently in a preset number if the second audio files in the preset number are stored.
Further, in the process of extracting the stored second audio files, the server determines whether a preset number of second audio files are stored in the database. The preset number can be set according to specific needs, such as 3, 5 or 6. And when the preset number of second audio files are stored in the database, the server superposes the second audio files which are stored recently and have the preset number with the first audio files, and a voiceprint model is established. If the preset number is set to 5 and at least 5 second audio files are stored in the data, the server extracts the second audio files and the first audio files which are stored for the last 5 times from the current time to stack, and establishes the voiceprint model.
Further, the establishing module 30 further includes:
the acquisition unit is used for acquiring all the stored second audio files if the preset number of second audio files are not stored;
the establishing unit is further used for establishing a voiceprint model according to all the obtained second audio files and the first audio files.
When the preset number of second audio files are not stored in the database, the server acquires all the second audio files stored in the database, and overlaps all the acquired second audio files with the first audio files to establish a voiceprint model. And if only three second audio files are stored in the database, the server superposes the three second audio files and the first audio file to suggest a voiceprint model.
In the embodiment, when a face video is acquired and a face image of the face video is successfully identified, an audio file in the face video is extracted and recorded as a first audio file; outputting prompt information to prompt an auditor to audit the face video; and when a notification message that the face video is approved is received, establishing a voiceprint model according to the first audio file. The method and the device realize that the audio file of the user is further acquired on the basis of face recognition, the voiceprint model is established according to the acquired audio file, and when the face video of the user is received next time, the user is confirmed to be a real user only when the face image in the face video is successfully recognized and the audio file in the face video is matched with the established voiceprint model, so that the accuracy of user recognition is improved.
Referring to fig. 4, fig. 4 is a functional block diagram of a second embodiment of the apparatus for building a voiceprint model according to the present invention, and the second embodiment of the apparatus for building a voiceprint model according to the present invention is provided based on the first embodiment.
In this embodiment, the apparatus for creating a voiceprint model further includes:
a judging module 40, configured to judge whether a voiceprint model already exists;
the output module 20 is further configured to output prompt information to prompt an auditor to audit the face video if the voiceprint model does not exist;
the extraction module 10 is further configured to extract an audio file corresponding to the voiceprint model if the voiceprint model exists, and record the audio file as a third audio file;
the apparatus for establishing the voiceprint model further comprises:
a comparison module 50, configured to compare the first audio file with the third audio file to obtain a similarity between the first audio file and the third audio file;
a sending module 60, configured to send the similarity between the first audio file and the third audio file to an asynchronous auditing system.
When the server extracts the face image from the face video, the server judges whether a voiceprint model exists in the database. And when the voiceprint model does not exist in the database, the server outputs prompt information to an asynchronous auditing system so that the asynchronous auditing system prompts an auditor to audit the face video. It can be understood that when the voiceprint model does not exist in the database, it indicates that the server acquires the face video of the user for the first time. It should be noted that the server and the asynchronous auditing system may be located in one computer or in two computers.
And when the database has the voiceprint model, the server extracts the audio file corresponding to the voiceprint model, namely extracts the audio file for establishing the voiceprint model, and records the audio file as a third audio file. And when the third audio file is obtained, the server compares the first audio file with the third audio file to obtain the similarity between the first audio file and the third audio file. Sending the similarity between the first audio file and the third audio file to an asynchronous auditing system, and outputting prompting information to the asynchronous auditing system by the server so that an auditing staff is prompted by the asynchronous auditing system to audit the face video; and when the asynchronous audit result passes, the server establishes the voiceprint model, and when the asynchronous audit result does not pass, the server ends the process of establishing the voiceprint model. The preset threshold may be set according to specific needs, such as 60%, 70%, or 85%.
In this embodiment, when the first audio file in the face video is extracted and the voiceprint model exists in the database of the server, the third audio file corresponding to the voiceprint model is extracted, the third audio file is compared with the first audio file, and subsequent operations are performed according to the comparison result. The accuracy of the established voiceprint model is improved, and the established voiceprint model is more in line with the real voice characteristics of the user.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (8)

1. A method of creating a voiceprint model, the method comprising:
when a face video is obtained and a face image of the face video is successfully identified, extracting an audio file in the face video and recording the audio file as a first audio file;
outputting prompt information to prompt an auditor to audit the face video;
when a notification message that the face video is approved is received, judging whether a voiceprint model exists or not;
if the voiceprint model does not exist, establishing the voiceprint model according to the first audio file;
if the voiceprint model exists, deleting the existing voiceprint model, and extracting a stored second audio file, wherein the second audio file is an audio file which is successfully registered, and the audio file which is successfully registered is an audio file of which the voiceprint model is established;
and superposing the first audio file and the second audio file to obtain a voiceprint model.
2. The method of creating a voiceprint model according to claim 1 wherein said step of extracting said stored second audio file comprises:
judging whether a preset number of second audio files are stored or not;
if the preset number of second audio files are stored, the step of establishing a voiceprint model according to the first audio file and the second audio file comprises the following steps:
and establishing a voiceprint model according to the second audio files and the first audio files which are stored recently in a preset number.
3. The method of claim 2, wherein the step of determining whether a predetermined number of second audio files are stored further comprises:
if the preset number of second audio files are not stored, acquiring all the stored second audio files;
the step of establishing a voiceprint model from the first audio file and the second audio file comprises:
and establishing a voiceprint model according to all the obtained second audio files and the first audio files.
4. The method according to any one of claims 1 to 3, wherein after the step of extracting an audio file in the face video and recording as the first audio file when the face video is acquired and the face image of the face video is successfully recognized, the method further comprises:
judging whether a voiceprint model exists or not;
if the voiceprint model does not exist, outputting prompt information to prompt an auditor to audit the face video;
if the voiceprint model exists, extracting an audio file corresponding to the voiceprint model, and recording the audio file as a third audio file;
comparing the first audio file with the third audio file to obtain the similarity between the first audio file and the third audio file;
and sending the similarity between the first audio file and the third audio file to an asynchronous auditing system, and outputting prompting information to prompt an auditor to audit the face video.
5. An apparatus for creating a voiceprint model, said apparatus for creating a voiceprint model comprising:
the extraction module is used for extracting an audio file in the face video and recording the audio file as a first audio file when the face video is obtained and a face image of the face video is successfully identified;
the output module is used for outputting prompt information to prompt an auditor to audit the face video;
the establishing module is used for establishing a voiceprint model according to the first audio file when the notification message that the face video is approved is received;
the establishing module comprises:
the judging unit is used for judging whether a voiceprint model exists or not when the notification message that the face video is approved is received;
the establishing unit is used for establishing a voiceprint model according to the first audio file if the voiceprint model does not exist;
the extracting unit is used for deleting the existing voiceprint model and extracting the stored second audio file if the voiceprint model exists, wherein the second audio file is an audio file which is successfully registered, and the audio file which is successfully registered is an audio file of which the voiceprint model is established;
the establishing unit is further configured to superimpose the first audio file and the second audio file to obtain a voiceprint model.
6. The apparatus for creating a voiceprint model according to claim 5, wherein said determining unit is further configured to determine whether a preset number of said second audio files are stored;
the establishing unit is further configured to establish a voiceprint model according to the second audio files and the first audio files which are stored recently in a preset number if the second audio files in the preset number are stored.
7. The apparatus for modeling a voiceprint of claim 6 wherein said building module further comprises:
the acquisition unit is used for acquiring all the stored second audio files if the preset number of second audio files are not stored;
the establishing unit is further used for establishing a voiceprint model according to all the obtained second audio files and the first audio files.
8. The apparatus for modeling a voiceprint according to any one of claims 5 to 7, wherein said apparatus for modeling a voiceprint further comprises:
the judging module is used for judging whether the voiceprint model exists or not;
the output module is also used for outputting prompt information to prompt an auditor to audit the face video if the voiceprint model does not exist;
the extracting module is further used for extracting an audio file corresponding to the voiceprint model if the voiceprint model exists and recording the audio file as a third audio file;
the apparatus for establishing the voiceprint model further comprises:
the comparison module is used for comparing the first audio file with the third audio file to obtain the similarity between the first audio file and the third audio file;
and the sending module is used for sending the similarity between the first audio file and the third audio file to an asynchronous auditing system.
CN201611005290.4A 2016-11-11 2016-11-11 Method and device for establishing voiceprint model Active CN106782567B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611005290.4A CN106782567B (en) 2016-11-11 2016-11-11 Method and device for establishing voiceprint model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611005290.4A CN106782567B (en) 2016-11-11 2016-11-11 Method and device for establishing voiceprint model

Publications (2)

Publication Number Publication Date
CN106782567A CN106782567A (en) 2017-05-31
CN106782567B true CN106782567B (en) 2020-04-03

Family

ID=58969608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611005290.4A Active CN106782567B (en) 2016-11-11 2016-11-11 Method and device for establishing voiceprint model

Country Status (1)

Country Link
CN (1) CN106782567B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107274906A (en) * 2017-06-28 2017-10-20 百度在线网络技术(北京)有限公司 Voice information processing method, device, terminal and storage medium
CN109325742A (en) * 2018-09-26 2019-02-12 平安普惠企业管理有限公司 Business approval method, apparatus, computer equipment and storage medium
CN111611437A (en) * 2020-05-20 2020-09-01 浩云科技股份有限公司 Method and device for preventing face voiceprint verification and replacement attack
CN114245204B (en) * 2021-12-15 2023-04-07 平安银行股份有限公司 Video surface signing method and device based on artificial intelligence, electronic equipment and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1646018A1 (en) * 2004-10-08 2006-04-12 Fujitsu Limited Biometric authentication device, biometric information authentication method, and program
CN201820245U (en) * 2010-12-01 2011-05-04 福州海景科技开发有限公司 Portrait biometric identification device in financial transaction based on portrait biometric identification technology
CN104834849A (en) * 2015-04-14 2015-08-12 时代亿宝(北京)科技有限公司 Dual-factor identity authentication method and system based on voiceprint recognition and face recognition
CN204576520U (en) * 2015-04-14 2015-08-19 时代亿宝(北京)科技有限公司 Based on the Dual-factor identity authentication device of Application on Voiceprint Recognition and recognition of face
CN105119872A (en) * 2015-02-13 2015-12-02 腾讯科技(深圳)有限公司 Identity verification method, client, and service platform
CN105550928A (en) * 2015-12-03 2016-05-04 城市商业银行资金清算中心 System and method of network remote account opening for commercial bank
CN105577664A (en) * 2015-12-22 2016-05-11 深圳前海微众银行股份有限公司 Cipher reset method and system, client and server

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1646018A1 (en) * 2004-10-08 2006-04-12 Fujitsu Limited Biometric authentication device, biometric information authentication method, and program
CN201820245U (en) * 2010-12-01 2011-05-04 福州海景科技开发有限公司 Portrait biometric identification device in financial transaction based on portrait biometric identification technology
CN105119872A (en) * 2015-02-13 2015-12-02 腾讯科技(深圳)有限公司 Identity verification method, client, and service platform
CN104834849A (en) * 2015-04-14 2015-08-12 时代亿宝(北京)科技有限公司 Dual-factor identity authentication method and system based on voiceprint recognition and face recognition
CN204576520U (en) * 2015-04-14 2015-08-19 时代亿宝(北京)科技有限公司 Based on the Dual-factor identity authentication device of Application on Voiceprint Recognition and recognition of face
CN105550928A (en) * 2015-12-03 2016-05-04 城市商业银行资金清算中心 System and method of network remote account opening for commercial bank
CN105577664A (en) * 2015-12-22 2016-05-11 深圳前海微众银行股份有限公司 Cipher reset method and system, client and server

Also Published As

Publication number Publication date
CN106782567A (en) 2017-05-31

Similar Documents

Publication Publication Date Title
US10135818B2 (en) User biological feature authentication method and system
US20240143842A1 (en) System and method for validating authorship of an electronic signature session
AU2018354129B2 (en) System and method for automated online notarization meeting recovery
CN106782567B (en) Method and device for establishing voiceprint model
CN106373575B (en) User voiceprint model construction method, device and system
CN107707970B (en) A kind of electronic contract signature method, system and terminal
CN109816521A (en) A kind of banking processing method, apparatus and system
US9728191B2 (en) Speaker verification methods and apparatus
WO2021175019A1 (en) Guide method for audio and video recording, apparatus, computer device, and storage medium
US20070255564A1 (en) Voice authentication system and method
WO2020077885A1 (en) Identity authentication method and apparatus, computer device and storage medium
CN104780043A (en) Access control method and system based on two-dimension code
WO2018072588A1 (en) Approval signature verification method, mobile device, terminal device, and system
CN108171137A (en) A kind of face identification method and system
CN110771092A (en) System and method for synchronizing conference interactions between multiple software clients
CN111160928A (en) Identity verification method and device
CN114553838A (en) Method, system and server for implementing remote service handling
US20120330663A1 (en) Identity authentication system and method
WO2018098686A1 (en) Safety verification method and device, terminal apparatus, and server
CN108766442A (en) A kind of identity identifying method and device based on vocal print pattern identification
WO2016058540A1 (en) Identity authentication method and apparatus and storage medium
KR101055890B1 (en) Time and attendance management system for registration of finger print after the fact and method thereof
CN116881887A (en) Application program login method, device, equipment, storage medium and program product
CN117808299A (en) Service handling method, device, equipment and medium
CN116074015A (en) Bank terminal transaction method and device based on blockchain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant