CN106782567A

CN106782567A - Method and device for establishing voiceprint model

Info

Publication number: CN106782567A
Application number: CN201611005290.4A
Authority: CN
Inventors: 卢道和; 陈朝亮; 杨军; 黄叶飞; 杨粟; 李晓俊; 钟伟
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2016-11-11
Filing date: 2016-11-11
Publication date: 2017-05-31
Anticipated expiration: 2036-11-11
Also published as: CN106782567B

Abstract

The invention discloses a method and a device for establishing a voiceprint model, wherein the method comprises the following steps: when a face video is obtained and a face image of the face video is successfully identified, extracting an audio file in the face video and recording the audio file as a first audio file; outputting prompt information to prompt an auditor to audit the face video; and when a notification message that the face video is approved is received, establishing a voiceprint model according to the first audio file. The invention further obtains the audio file of the user on the basis of face recognition, establishes the voiceprint model according to the obtained audio file, and confirms that the user is a real user when the face video of the user is received next time and only when the face image in the face video is successfully recognized and the audio file in the face video is matched with the established voiceprint model, thereby improving the accuracy of user recognition.

Description

The method and apparatus for setting up sound-groove model

Technical field

The present invention relates to identity identification technical field, more particularly to a kind of method and apparatus for setting up sound-groove model.

Background technology

With the development of science and technology, many bankings can not go bank counter to handle now, such as bank card Inquiry business, freezes business, business of opening an account etc., and user directly can handle by phone or on the internet every business.But It is, it is existing to handle every business by phone or on the internet, it is required for being input into bank card account number and password, if silver-colored Row card account input error or Password Input mistake, then need to re-enter.And, when 3 passwords of user input all mistakes When, bank card will be locked, and user then cannot again handle corresponding business, until user goes bank counter to unlock bank Card.Therefore, existing solution can only confirm the identity of user by recognition of face.

The above is only used for auxiliary and understands technical scheme, does not represent and recognizes that the above is existing skill Art.

The content of the invention

It is a primary object of the present invention to provide a kind of method and apparatus for setting up sound-groove model, it is intended to how solve in people The technical problem of identifying user accuracy rate is improved on the basis of face identification.

To achieve the above object, a kind of method for setting up sound-groove model that the present invention is provided, the sound-groove model of setting up Method includes：

When getting face video, and successfully recognizing the facial image of the face video, the face video is extracted In audio file, be designated as the first audio file；

Output prompt message, to point out auditor to audit the face video；

When the notification message that the face video examination ＆ verification passes through is received, vocal print is set up according to first audio file Model.

Preferably, it is described when the notification message that the face video examination ＆ verification passes through is received, according to first audio The step of file sets up sound-groove model includes：

When the notification message that the face video examination ＆ verification passes through is received, judge whether existing sound-groove model；

If there is no sound-groove model, sound-groove model is set up according to first audio file；

If existing sound-groove model, already present sound-groove model is deleted, extract the second stored audio file, its In, second audio file is the audio file for succeeding in registration；

Sound-groove model is set up according to first audio file and second audio file.

Preferably, it is described to include the step of extract stored the second audio file：

Judge whether to be stored with second audio file of preset number；

It is described according to first audio file and institute if second audio file of the preset number that is stored with Stating the step of the second audio file sets up sound-groove model includes：

Second audio file and first audio file according to nearest stored preset number set up vocal print mould Type.

Preferably, after the step of second audio file of the preset number that judges whether to be stored with, also include：

If second audio file of the preset number that is not stored with, all described second sound that acquisition is stored Frequency file；

It is described to include the step of set up sound-groove model according to first audio file and second audio file：

Sound-groove model is set up according to acquired all described second audio file and first audio file.

Preferably, it is described to get face video, and when successfully recognizing the facial image of the face video, extract institute State the audio file in face video, the step of be designated as the first audio file after, also include：

Judge whether existing sound-groove model；

If there is no sound-groove model, output prompt message is performed, to point out auditor to audit the face video Step；

If existing sound-groove model, audio file corresponding with the sound-groove model is extracted, be designated as the 3rd audio file；

First audio file is contrasted with the 3rd audio file, first audio file and institute is obtained State the similarity between the 3rd audio file；

Similarity between first audio file and the 3rd audio file is sent to asynchronous auditing system, and Perform output prompt message, with point out auditor audit the face video the step of.

Additionally, to achieve the above object, the present invention also provides a kind of device for setting up sound-groove model, described to set up vocal print mould The device of type includes：

Extraction module, face video is got for working as, and when successfully recognizing the facial image of the face video, is extracted Audio file in the face video, is designated as the first audio file；

Output module, for exporting prompt message, to point out auditor to audit the face video；

Module is set up, for when the notification message that the face video examination ＆ verification passes through is received, according to first sound Frequency file sets up sound-groove model.

Preferably, the module of setting up includes：

Judging unit, for when the notification message that the face video examination ＆ verification passes through is received, judging whether existing Sound-groove model；

Unit is set up, if for there is no sound-groove model, sound-groove model is set up according to first audio file；

Extraction unit, if for existing sound-groove model, deleting already present sound-groove model, extracts stored second Audio file, wherein, second audio file is the audio file for succeeding in registration；

The unit of setting up is additionally operable to set up sound-groove model according to first audio file and second audio file.

Preferably, the judging unit is additionally operable to judge whether to be stored with second audio file of preset number；

If the unit of setting up is additionally operable to be stored with second audio file of the preset number, according to nearest institute Second audio file and first audio file for storing preset number set up sound-groove model.

Preferably, the module of setting up also includes：

Acquiring unit, if for second audio file of the preset number that is not stored with, what acquisition was stored All second audio files；

It is described set up unit be additionally operable to according to acquired in all described second audio file and first audio file Set up sound-groove model.

Preferably, the device for setting up sound-groove model also includes：

Judge module, for judging whether existing sound-groove model；

If the output module is additionally operable to no presence of sound-groove model, prompt message is exported, to point out auditor to audit The face video；

If the extraction module is additionally operable to existing sound-groove model, audio text corresponding with the sound-groove model is extracted Part, is designated as the 3rd audio file；

The device for setting up sound-groove model also includes：

Contrast module, for first audio file to be contrasted with the 3rd audio file, obtains described Similarity between one audio file and the 3rd audio file；

Sending module, it is different for the similarity between first audio file and the 3rd audio file to be sent to Step auditing system.

The present invention gets face video by working as, and when successfully recognizing the facial image of the face video, extracts institute The audio file in face video is stated, the first audio file is designated as；Output prompt message, to point out auditor to audit the people Face video；When the notification message that the face video examination ＆ verification passes through is received, vocal print is set up according to first audio file Model.Realize on the basis of recognition of face, further obtain the audio file of user, built according to acquired audio file Vertical sound-groove model, when the face video of user is received next time, only when the facial image in face video is recognized successfully, and When audio file in face video coincide with the sound-groove model set up, confirmation user is real user, to improve user The accuracy of identification.

Brief description of the drawings

Fig. 1 is the schematic flow sheet of the first embodiment of the method that the present invention sets up sound-groove model；

Fig. 2 is the schematic flow sheet of the second embodiment of the method that the present invention sets up sound-groove model；

Fig. 3 is the high-level schematic functional block diagram of the first embodiment of the device that the present invention sets up sound-groove model；

Fig. 4 is the high-level schematic functional block diagram of the second embodiment of the device that the present invention sets up sound-groove model.

The realization of the object of the invention, functional characteristics and advantage will be described further referring to the drawings in conjunction with the embodiments.

Specific embodiment

It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.

Reference picture 1, Fig. 1 is the schematic flow sheet of the first embodiment of the method that the present invention sets up sound-groove model.

In the present embodiment, the method for setting up sound-groove model includes：

Step S10, when getting face video, and successfully recognizing the facial image of the face video, extracts described Audio file in face video, is designated as the first audio file；

When user is needed by phone or internet handling bank business, the server prompts user institute where bank The mobile terminal held calls camera to obtain the face video of user, wherein, the face video includes the face figure of user Picture and audio file.It should be noted that the method that the server obtains the face video can be：Extracting user's face In image process, make the corresponding numeral of display in the screen of the mobile terminal or word, allow user within the regular hour Read shown numeral or word；Or during user's facial image is extracted, in making the screen of the mobile terminal Output prompt message, points out user that the language of predetermined number is read within the regular hour.The mobile terminal includes but does not limit In smart mobile phone and panel computer.

When the face video is got, the server extracts the facial image in the face video, will be carried The facial image for taking is contrasted with the facial image for prestoring the user, wherein, the face figure of the user that will be prestored As being designated as the facial image that prestores.When the similarity between the facial image and the facial image that prestores is more than or equal to default phase When seemingly spending, the server confirms that the facial image is recognized successfully；When between the facial image and the facial image that prestores When similarity is less than the default similarity, the server confirms the facial image recognition failures.The default similarity Can according to specific needs set, such as may be configured as 60%, 70%, or 80% etc..

When the facial image is successfully recognized, the server extracts the audio file in the face video, and will The audio file extracted from the face video is designated as the first audio file.

Step S20, exports prompt message, to point out auditor to audit the face video；

When first audio file is obtained, the server exports prompt message to asynchronous auditing system, to point out Asynchronous examination personnel audit the authenticity of the face video.It should be noted that when the examination personnel are examining During the authenticity of face video described in core, the examination personnel can by the facial image in the face video with The facial image for prestoring is contrasted.Wherein, the facial image for prestoring can be a width, or several.When The examination personnel confirm that the facial image in the face video is real, when being user, the examination ＆ verification work Make personnel and the notification message for passing through is audited to the server by the asynchronous auditing system return；As the examination people When member confirms that the facial image in the face video is not user, the examination personnel are by the asynchronous examination ＆ verification System returns to the notification message of examination ＆ verification failure to the server.

It is true according to the notification message when the server receives the notification message transmitted by the asynchronous auditing system During the fixed face video examination ＆ verification failure, the server terminates the Establishing process of sound-groove model.

In the present embodiment, the server first extracts audio file in the face video, then just output prompting Information.In other embodiments, the server can also first export prompt message, after face video examination ＆ verification passes through, The server extracts facial image from the face video again.

Step S30, when the notification message that the face video examination ＆ verification passes through is received, according to first audio file Set up sound-groove model.

When the server receives the notification message that the face video examination ＆ verification transmitted by the asynchronous auditing system passes through When, the server sets up sound-groove model according to the first audio file extracted from the face video.

Further, the step S30 includes：

Step a, when the notification message that the face video examination ＆ verification passes through is received, judges whether existing vocal print mould Type；

Step b, if there is no sound-groove model, sound-groove model is set up according to first audio file；

Step c, if existing sound-groove model, deletes already present sound-groove model, extracts stored the second audio text Part, wherein, second audio file is the audio file for succeeding in registration；

Step d, sound-groove model is set up according to first audio file and second audio file.

Further, when the server receives the notification message that the face video examination ＆ verification passes through, the service Device judge in database whether existing sound-groove model.When there is no sound-groove model in the database, the server root Sound-groove model is set up according to first audio file.When existing sound-groove model in the database, the server is deleted Sound-groove model in the database.After sound-groove model during the server deletes the database, the server exists The second stored audio file is extracted in the database, wherein, second audio file is registration in the database Successful audio file.It should be noted that the audio file for succeeding in registration is the audio file for having set up sound-groove model, i.e., The audio file for succeeding in registration is the deleted corresponding audio file of history vocal print model.When the server obtains described During the second audio file, be overlapped for first audio file and second audio file by the server, obtains sound Line model.It is overlapped and obtains sound-groove model by first audio file and second audio, optimizes the server In sound-groove model, set up sound-groove model is more met the sound characteristic of user.

Further, it is described to include the step of extract stored the second audio file：

Step e, second audio file of the preset number that judges whether to be stored with；

If second audio file of the preset number that is stored with, the step d includes：

Step f, second audio file and first audio file according to nearest stored preset number are set up Sound-groove model.

Further, during the second stored audio file is extracted, the server judges the server Second audio file of the preset number that whether is stored with the database.The preset number can according to specific needs and Set, such as may be configured as 3,5 or 6 etc..When second audio file of the preset number that is stored with the database When, second audio file and first audio file of the preset number that the server will be stored recently are folded Plus, set up sound-groove model.Such as when the preset number is set to 5, and at least 5 the second audio texts that are stored with data During part, the server will be started at from current time, second audio file and described first that extraction is stored for nearest 5 times Audio file is overlapped, and sets up the sound-groove model.

Further, the method for setting up sound-groove model also includes

Step g, if second audio file of the preset number that is not stored with, it is all described that acquisition is stored Second audio file；

The step d includes：

Step h, sound-groove model is set up according to acquired all described second audio file and first audio file.

When second audio file of the preset number that is not stored with the database, the server is obtained All of second audio file stored in the database, by acquired all of described second audio file and described the One audio file is overlapped, and sets up sound-groove model.When the second audio file as described in three that is only stored with the database When, then be overlapped for three the second audio files and first audio file by the server, it is proposed that sound-groove model.

The present embodiment gets face video by working as, and when successfully recognizing the facial image of the face video, extracts Audio file in the face video, is designated as the first audio file；Output prompt message, it is described to point out auditor to audit Face video；When the notification message that the face video examination ＆ verification passes through is received, sound is set up according to first audio file Line model.Realize on the basis of recognition of face, the audio file of user is further obtained, according to acquired audio file Sound-groove model is set up, when the face video of user is received next time, only when the facial image in face video is recognized successfully, And the audio file in face video, when being coincide with the sound-groove model set up, confirmations user is real user, to improve use The accuracy of family identification.

Further, reference picture 2, Fig. 2 is that the flow of the second embodiment of the method that the present invention sets up sound-groove model is illustrated Figure, proposes that the present invention sets up the second embodiment of the method for sound-groove model based on first embodiment.

In the present embodiment, the method for setting up sound-groove model also includes：

Step S40, judges whether existing sound-groove model；

If there is no sound-groove model, step S20 is performed；

Step S50, if existing sound-groove model, extracts audio file corresponding with the sound-groove model, is designated as the 3rd Audio file；

Step S60, first audio file is contrasted with the 3rd audio file, obtains first audio Similarity between file and the 3rd audio file；

Step S70, asynchronous examining is sent to by the similarity between first audio file and the 3rd audio file Core system.

In the present embodiment, as execution of step S70, step S20 is performed.

When the server extracts the facial image from the face video, the server judges the number According in storehouse whether existing sound-groove model.When there is no sound-groove model in the database, the server output prompting letter Cease to asynchronous auditing system, so that the asynchronous auditing system prompting auditor audits the face video.May be appreciated It is, when there is no sound-groove model in the database, to represent that the server gets the face video of user for the first time.Need It is noted that the server and the asynchronous auditing system can be in together in a computer, it is also possible in two meters In calculation machine.

When existing sound-groove model in the database, the server extracts audio corresponding with the sound-groove model File, that is, extract the audio file for setting up the sound-groove model, is designated as the 3rd audio file.When obtaining the 3rd audio file When, the server is contrasted first audio file with the 3rd audio file, obtains the first audio text Similarity between part and the 3rd audio file.By the phase between first audio file and the 3rd audio file Asynchronous auditing system is sent to like degree, the server exports prompt message to the asynchronous auditing system, for described asynchronous Auditing system prompting auditor audits the face video；When the asynchronous auditing result passes through, the server is then built Vertical sound-groove model, when the asynchronous auditing result is obstructed out-of-date, the server then terminates to set up the flow of sound-groove model.It is described Predetermined threshold value can be set according to specific needs, such as may be configured as 60%, 70%, or 85% etc..

The present embodiment by the first audio file in the face video is extracted, and in the database of server When there is sound-groove model, extract corresponding with the sound-groove model the 3rd audio file, by the 3rd audio file with it is described First audio file is contrasted, and subsequent operation is carried out according to comparing result.The accuracy rate of set up sound-groove model is improve, Set up sound-groove model is set more to meet the real sound characteristic of user.

The present invention further provides a kind of device for setting up sound-groove model.

Reference picture 3, Fig. 3 is the high-level schematic functional block diagram of the first embodiment of the device that the present invention sets up sound-groove model.

In the present embodiment, the device for setting up sound-groove model includes：

Extraction module 10, gets face video, and when successfully recognizing the facial image of the face video, carry for working as The audio file in the face video is taken, the first audio file is designated as；

Output module 20, for exporting prompt message, to point out auditor to audit the face video；

Module 30 is set up, for when the notification message that the face video examination ＆ verification passes through is received, according to described first Audio file sets up sound-groove model.

Further, the module 30 of setting up includes：

Further, the judging unit is additionally operable to judge whether to be stored with second audio file of preset number；

Further, the module 30 of setting up also includes：

Reference picture 4, Fig. 4 is the high-level schematic functional block diagram of the second embodiment of the device that the present invention sets up sound-groove model, base Propose that the present invention sets up the second embodiment of the device of sound-groove model in first embodiment.

In the present embodiment, the device for setting up sound-groove model also includes：

Judge module 40, for judging whether existing sound-groove model；

If the output module 20 is additionally operable to no presence of sound-groove model, prompt message is exported, to point out auditor to examine Face video described in core；

If the extraction module 10 is additionally operable to existing sound-groove model, audio text corresponding with the sound-groove model is extracted Part, is designated as the 3rd audio file；

The device for setting up sound-groove model also includes：

Contrast module 50, for first audio file to be contrasted with the 3rd audio file, obtains described Similarity between first audio file and the 3rd audio file；

Sending module 60, for the similarity between first audio file and the 3rd audio file to be sent to Asynchronous auditing system.

The embodiments of the present invention are for illustration only, and the quality of embodiment is not represented.Embodiment party more than The description of formula, it is required general that those skilled in the art can be understood that above-described embodiment method can add by software The mode of hardware platform is realized, naturally it is also possible to by hardware, but the former is more preferably implementation method in many cases.It is based on Such understanding, the part that technical scheme substantially contributes to prior art in other words can be with software product Form embody, the computer software product store in a storage medium (such as ROM/RAM, magnetic disc, CD), including Some instructions are used to so that a station terminal equipment (can be mobile phone, computer, server, or network equipment etc.) performs this hair Method described in bright each embodiment.

The preferred embodiments of the present invention are these are only, the scope of the claims of the invention is not thereby limited, it is every to utilize this hair Equivalent structure or equivalent flow conversion that bright specification and accompanying drawing content are made, or directly or indirectly it is used in other related skills Art field, is included within the scope of the present invention.

Claims

1. a kind of method for setting up sound-groove model, it is characterised in that the method for setting up sound-groove model includes：

When getting face video, and successfully recognizing the facial image of the face video, in the extraction face video Audio file, is designated as the first audio file；

Output prompt message, to point out auditor to audit the face video；

When the notification message that the face video examination ＆ verification passes through is received, vocal print mould is set up according to first audio file Type.

2. the method for setting up sound-groove model as claimed in claim 1, it is characterised in that described to receive the face video Include during the notification message that examination ＆ verification passes through, the step of set up sound-groove model according to first audio file：

If existing sound-groove model, already present sound-groove model is deleted, extract the second stored audio file, wherein, institute It is the audio file for succeeding in registration to state the second audio file；

3. the method for setting up sound-groove model as claimed in claim 2, it is characterised in that the second audio that the extraction is stored The step of file, includes：

Judge whether to be stored with second audio file of preset number；

It is described according to first audio file and described if second audio file of the preset number that is stored with The step of two audio files set up sound-groove model includes：

Second audio file and first audio file according to nearest stored preset number set up sound-groove model.

4. the method for setting up sound-groove model as claimed in claim 3, it is characterised in that described to judge whether the present count that is stored with Described in purpose the step of the second audio file after, also include：

If second audio file of the preset number that is not stored with, all described second audio text that acquisition is stored Part；

5. the method for setting up sound-groove model as described in any one of Claims 1-4, it is characterised in that described when getting people Face video, and when successfully recognizing the facial image of the face video, extracts the audio file in the face video, is designated as the After the step of one audio file, also include：

Judge whether existing sound-groove model；

If there is no sound-groove model, perform output prompt message, with point out auditor audit the face video the step of；

First audio file is contrasted with the 3rd audio file, first audio file is obtained with described Similarity between three audio files；

Similarity between first audio file and the 3rd audio file is sent to asynchronous auditing system, and is performed Output prompt message, with point out auditor audit the face video the step of.

6. a kind of device for setting up sound-groove model, it is characterised in that the device for setting up sound-groove model includes：

Extraction module, face video is got for working as, and when successfully recognizing the facial image of the face video, extracts described Audio file in face video, is designated as the first audio file；

Module is set up, for when the notification message that the face video examination ＆ verification passes through is received, according to first audio text Part sets up sound-groove model.

7. the device of sound-groove model is set up as claimed in claim 6, it is characterised in that the module of setting up includes：

Judging unit, for when the notification message that the face video examination ＆ verification passes through is received, judging whether existing vocal print Model；

Extraction unit, if for existing sound-groove model, deleting already present sound-groove model, extracts the second stored audio File, wherein, second audio file is the audio file for succeeding in registration；

8. the device of sound-groove model is set up as claimed in claim 7, it is characterised in that the judging unit is additionally operable to judgement is Second audio file of the no preset number that is stored with；

If the unit of setting up is additionally operable to be stored with second audio file of the preset number, according to being stored recently Second audio file and first audio file of preset number set up sound-groove model.

9. the device of sound-groove model is set up as claimed in claim 8, it is characterised in that the module of setting up also includes：

Acquiring unit, if for second audio file of the preset number that is not stored with, it is all that acquisition is stored Second audio file；

10. the device for setting up sound-groove model as described in any one of claim 6 to 9, it is characterised in that described to set up vocal print mould The device of type also includes：

Judge module, for judging whether existing sound-groove model；

If the output module is additionally operable to no presence of sound-groove model, prompt message is exported, it is described to point out auditor to audit Face video；

If the extraction module is additionally operable to existing sound-groove model, audio file corresponding with the sound-groove model is extracted, remembered It is the 3rd audio file；

The device for setting up sound-groove model also includes：

Contrast module, for first audio file to be contrasted with the 3rd audio file, obtains first sound Similarity between frequency file and the 3rd audio file；

Sending module, for the similarity between first audio file and the 3rd audio file to be sent into asynchronous examining Core system.