CN108305633A

CN108305633A - Speech verification method, apparatus, computer equipment and computer readable storage medium

Info

Publication number: CN108305633A
Application number: CN201810041764.3A
Authority: CN
Inventors: 黄创茗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-01-16
Filing date: 2018-01-16
Publication date: 2018-07-20
Anticipated expiration: 2038-01-16
Also published as: WO2019140823A1; CN108305633B

Abstract

This application involves a kind of auth method, system, computer equipment and storage mediums.The method includes：Obtain voice messaging to be verified and corresponding user identifier；Vocal print feature to be verified and text to be verified are extracted from the voice messaging to be verified；Obtain current scene type；Inquire and the current scene type matching and characteristic model corresponding with the user identifier；It is with reference to vocal print feature by the text conversion to be verified by this feature model；Compare the vocal print feature to be verified and this refers to vocal print feature, obtains speech verification result；When speech verification result expression is verified, then retraining is carried out to this feature model according to the vocal print feature to be verified；And the current scene type matching and characteristic model corresponding with the user identifier is updated using the characteristic model after retraining.The sound of user can also can recognize that, and then improve the recall rate of speech verification when user voice changes using this method.

Description

Speech verification method, apparatus, computer equipment and computer readable storage medium

Technical field

This application involves technical field of voice recognition, more particularly to a kind of speech verification method, apparatus, computer equipment And computer readable storage medium.

Background technology

It, can be accurate by identifying the biological characteristic of user since the biological characteristic of each two people is different from Confirm the identity of user.The sensor that the biological characteristic of identification human body needs precision high, and the sensor that these precision are high In general volume is all larger.

Currently, as the technology of sensor element obtains promotion at full speed, precision, volume and the price of sensor element are all Significant progress has been obtained, therefore can also realize verify the side of user identity by identifying biological characteristic on mobile terminals Method.And identify a kind of verification method that the vocal print of user is exactly relatively conventional in traditional technology.

However, the speech verification method in traditional technology can only ensure in the case that the sound of user is constant verification at Work(can all make traditional speech verification method fail, the recall rate of verification is very low when user voice changes.

Invention content

Based on this, it is necessary in view of the above technical problems, provide it is a kind of can be in the voice of good authentication under different scenes Verification method, device, computer equipment and storage medium.

A kind of speech verification method, including：

Obtain voice messaging to be verified and corresponding user identifier；

Vocal print feature to be verified and text to be verified are extracted from the voice messaging to be verified；

Obtain current scene type；

Inquire and the current scene type matching and characteristic model corresponding with the user identifier；

It is with reference to vocal print feature by the text conversion to be verified by this feature model；

Compare the vocal print feature to be verified and this refers to vocal print feature, obtains speech verification result；

When the speech verification result expression be verified when, then according to the vocal print feature to be verified to this feature model into Row retraining；

And the current scene type matching and spy corresponding with the user identifier is updated using the characteristic model after retraining Levy model.

In one embodiment, acquisition voice messaging to be verified and corresponding user identifier, including：

Obtain authentication instruction；

It is instructed in response to the authentication, obtains user identifier；

Inquiry corresponds to the text that the user identifier is pre-configured；

When not inquiring the text, text is generated at random；

Feed back the text generated at random；

The voice messaging to be verified that the text of acquisition and feedback matches.

In one embodiment, vocal print feature to be verified and text to be verified should be extracted from the voice messaging to be verified This, including：

The voice messaging to be verified is parsed, corresponding acoustic signals are obtained；

By the acoustic signals framing, the acoustic signals of each frame are obtained；

Fourier transform is carried out to the acoustic signals of each frame, obtains corresponding frequency spectrum；

Single frames vocal print feature is extracted from the frequency spectrum；

The vocal print feature of the voice messaging to be verified is generated according to the single frames vocal print feature of each frame；

Convert the vocal print feature to text to be verified.

In one embodiment, this method further includes：

Acquire current noise information；

Anti-interference model is generated according to the noise information of acquisition；

After parsing obtains acoustic signals, after being corrected by the acoustic signals that the anti-interference model obtains parsing, execute This is by the acoustic signals framing, the step of obtaining the acoustic signals of each frame.

In one embodiment, which includes：

Obtain the temporal information and/or geographical location information for acquiring the voice messaging to be verified；

The default scene type that inquiry matches with the temporal information and/or geographical location information；

Using the default scene type inquired as current scene type.

In one embodiment, which includes：

Obtain the temporal information and geographical location information for acquiring the voice messaging to be verified；

Search the Weather information to match with the temporal information and the geographical location information；

The default scene type that inquiry matches with the Weather information；

Using the default scene type inquired as current scene type.

In one embodiment, this method further includes：

Obtain public characteristic model；

Obtain trained speech samples corresponding with default scene type and the user identifier；

The public characteristic model is subjected to retraining according to the training speech samples, obtain presetting scene type with this and is somebody's turn to do The characteristic model that user identifier matches.

A kind of speech verification device, the device include：

Data obtaining module, for obtaining voice messaging to be verified and corresponding user identifier；

Information extraction modules, for extracting vocal print feature to be verified and text to be verified from the voice messaging to be verified This；

Type acquisition module, for obtaining current scene type；

Pattern query module, for inquiring and the current scene type matching and character modules corresponding with the user identifier Type；

Feature Conversion module, for being with reference to vocal print feature by the text conversion to be verified by this feature model；

Feature comparison module refers to vocal print feature with this for comparing the vocal print feature to be verified, obtains speech verification As a result

Retraining module, for when the verification result expression be verified when, then according to the vocal print feature pair to be verified This feature model carries out retraining；

Model modification module, for using the characteristic model update after retraining with the current scene type matching and with The corresponding characteristic model of the user identifier.

A kind of computer equipment, including memory and processor, the memory are stored with computer program, which holds The step of as above any one method is realized when the row computer program.

A kind of computer readable storage medium, is stored thereon with computer program, which is executed by processor Shi Shixian as above any one of method the step of.

Above-mentioned speech verification method, apparatus, computer equipment and computer readable storage medium are obtaining voice to be verified After information and corresponding user identifier, vocal print feature and text to be verified are extracted from voice messaging to be verified.Worked as by obtaining Preceding scene type inquires and current scene type matching and characteristic model corresponding with user identifier, due to voice to be verified Information is got under the current corresponding scene of scene type, therefore voice messaging to be verified and current scene type Match, vocal print feature to be verified also with current scene type matching.By characteristic model by text conversion to be verified for reference to sound Line feature, this with reference to vocal print feature naturally also with current scene type matching.Special with reference to vocal print feature and vocal print to be verified When levying all with current scene type matching, vocal print feature, obtained language are referred to by comparing the vocal print feature to be verified and this Sound verification result can accurately reflect voice messaging to be verified whether be user voice messaging, so as to When family sound changes, the sound of user also can recognize that.And when being verified, vocal print to be verified is used Feature pair and current scene type matching and characteristic model retraining corresponding with user identifier simultaneously update, can also improve with The validity of the corresponding characteristic model of this scene type, and then improve the recall rate of speech verification.

Description of the drawings

Fig. 1 is the application scenario diagram of speech verification method in one embodiment；

Fig. 2 is the flow diagram of speech verification method in one embodiment；

Fig. 3 is the flow diagram of speech verification method in another embodiment；

Fig. 4 is the structure diagram of speech verification device in one embodiment；

Fig. 5 is the structure diagram of speech verification device in another embodiment；

Fig. 6 is the structure diagram of speech verification device in one embodiment；

Fig. 7 is the structure diagram of speech verification device in another embodiment；

Fig. 8 is the structure diagram of speech verification device in one embodiment；

Fig. 9 is the internal structure chart of one embodiment Computer equipment.

Specific implementation mode

It is with reference to the accompanying drawings and embodiments, right in order to make the object, technical solution and advantage of the application be more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.

Speech verification method provided by the present application, can be applied in application environment as shown in Figure 1.Wherein, terminal 110 It is communicated by network with server 120 by network, user 100 passes through input unit operating terminal 110.Wherein, terminal 110 can be, but not limited to be various personal computers, laptop, smart mobile phone, tablet computer and portable wearable set Standby, server 120 can be realized with the server cluster of the either multiple server compositions of independent server.

In one embodiment, as shown in Fig. 2, providing a kind of speech verification method, it is applied in Fig. 1 in this way It is illustrated for terminal, but this method is not limited to only implement in terminal, specifically includes following steps：

S202 obtains voice messaging to be verified and corresponding user identifier.

Wherein, voice messaging to be verified is the voice messaging being verified in speech verification.User identifier is user identity Mark.

In one embodiment, after terminal collects voice messaging to be verified, which is sent to clothes Business device.After server receives voice messaging to be verified, user corresponding with the terminal of the voice messaging to be verified is sent is chosen Mark.

S204 extracts vocal print feature to be verified and text to be verified from the voice messaging to be verified.

Wherein, vocal print feature is the characteristic information of vocal print.Vocal print is the sound wave spectrum of voice messaging.It is characterized in describing object The information of shared characteristic, object can be vocal prints.Feature can be specifically MFCC (MelFrequency Cepstrum Coefficient, mel-frequency cepstrum coefficient) feature, (perceptual linearprediction, perception are linear pre- by PLP At least one of survey) feature and LPC (Linear Predictive Coding, linear prediction analysis) etc., can also be frequency At least one of spectrum, nasal sound, pronunciation and word speed etc..Vocal print feature to be verified is the vocal print spy being verified in speech verification Sign.Text to be verified is the text message being verified in speech verification.Text to be verified is specifically voice messaging to be verified with text The information that this form is recorded.

In one embodiment, server extracts vocal print feature to be verified and text to be verified from voice messaging to be verified This, corresponding terminal is returned by the vocal print feature to be verified extracted and text feedback to be verified.

S206 obtains current scene type.

Wherein, scene type is the type of scene.Scene be specifically place when obtaining voice messaging to be verified, the time, The combination of weather and environment etc..Current scene type is specifically the type of scene when obtaining voice messaging to be verified.

In one embodiment, terminal obtains location information and temporal information when acquiring voice messaging to be verified, will obtain The location information and temporal information taken is sent to server.Server obtains phase according to the location information and temporal information received The Weather information and environmental information answered, and determine that terminal is worked as according to the location information, temporal information, Weather information and environmental information Preceding scene type.

S208 inquires and the current scene type matching and characteristic model corresponding with the user identifier.

Wherein, characteristic model can be specifically the set of the vocal print feature of individual subscriber, and characteristic model can be used for simulating The vocal print feature of user.

In one embodiment, the corresponding current scene type of terminal and user identifier are being fed back to server by terminal Afterwards, server inquires and current scene type matching and characteristic model corresponding with user identifier from database.

The text conversion to be verified is with reference to vocal print feature by this feature model by S210.

Wherein, when being speech verification with reference to vocal print feature vocal print feature to be verified reference object.

In one embodiment, text to be verified is converted into voice messaging by server by characteristic model, and from conversion Extraction refers to vocal print feature in obtained voice messaging.

S212, compares the vocal print feature to be verified and this refers to vocal print feature, obtains speech verification result.

In one embodiment, server vocal print feature more to be verified and with reference to the language that after vocal print feature, will be obtained Sound verification result feeds back to terminal.If speech verification result expression is verified, terminal is according to the speech verification result Corresponding application program is unlocked.If the speech verification result indicate verification not by when, terminal reacquires language to be verified Message ceases.

S214, when speech verification result expression is verified, then according to the vocal print feature to be verified to this feature Model carries out retraining.

Wherein, retraining is carried out to characteristic model according to vocal print feature to be verified, can is specifically that comparison is to be verified The high vocal print feature of the frequency of occurrences in vocal print feature to be verified is added in characteristic model for vocal print feature and characteristic model.

In one embodiment, server is when detecting that the expression of speech verification result is verified, then to be verified The vocal print feature that the frequency of occurrences is higher than predetermined threshold value is chosen in vocal print feature, and the vocal print feature selected and characteristic model are carried out Comparison, if the vocal print feature selected vocal print feature difference corresponding with characteristic model is less than preset value, by what is selected Vocal print feature is added in characteristic model.

S216 is updated and the current scene type and the matched feature of the user identifier using the characteristic model after retraining Model.

In the present embodiment, after obtaining voice messaging to be verified and corresponding user identifier, from voice messaging to be verified Extract vocal print feature and text to be verified.By obtaining current scene type, inquiry with current scene type matching and with Family identifies corresponding characteristic model, since voice messaging to be verified is got under the current corresponding scene of scene type , therefore voice messaging to be verified and current scene type matching, vocal print feature to be verified also with current scene type matching. By characteristic model by text conversion to be verified be with reference to vocal print feature, this with reference to vocal print feature naturally also with current scene type Matching.With reference to vocal print feature and vocal print feature to be verified all with current scene type matching when, it is to be verified by comparing this Vocal print feature and this refer to vocal print feature, obtained speech verification result can accurately reflect that voice messaging to be verified is No is the voice messaging of user, so as to when user voice changes, also can recognize that the sound of user Sound.It is with current scene type matching and corresponding with user identifier using vocal print feature pair to be verified and when being verified Characteristic model retraining and update, the validity of characteristic model corresponding with this scene type, Jin Erti can also be improved The recall rate of high speech verification.

In one embodiment, acquisition voice messaging to be verified and corresponding user identifier, including：Obtain authentication Instruction；It is instructed in response to the authentication, obtains user identifier；Inquiry corresponds to the text that the user identifier is pre-configured；When not When inquiring the text, text is generated at random；Feed back the text generated at random；It acquires and is waited for what the text of feedback matched Verify voice messaging.

Wherein, authentication instruction is to activate the instruction of speech verification.The text of pre-configuration is particularly used for certification user The corresponding text message of voice messaging of identity.It is random to generate text, can be specifically that text is randomly selected in text list Information can also be to generate text message at random according to dictionary.

In one embodiment, terminal obtains user and is instructed by the authentication that touch screen triggers, in response to the identity Verification instruction, obtains corresponding user identifier in the database, and after obtaining user identifier, inquiry is corresponding to the user identifier The text of pre-configuration.When inquiring the text of pre-configuration, the mark of voice messaging is being acquired in the display screen display of terminal Know.When not inquiring the text of pre-configuration, text is generated according to dictionary at random, the text generated at random is shown on a display screen Show, and acquires voice messaging to be verified.

In one embodiment, terminal obtains user and is instructed by the authentication that touch screen triggers, by the authentication Instruction feedback value server.Server obtains corresponding user identifier in the database, and inquires corresponding to the user identifier The text of pre-configuration.When inquiring the text of pre-configuration, start the instruction for acquiring voice messaging to be verified to terminal feedback.When When not inquiring the text of pre-configuration, text is generated according to dictionary at random, the text generated at random is sent to terminal.

In the present embodiment, by obtaining user identifier, text of the inquiry corresponding to the pre-configuration of user identifier.If inquiry To the text of pre-configuration, so that it may directly to acquire voice messaging to be verified so that speech verification is prompt quickly.If do not inquired pre- The text of configuration then generates text, can also improve safety at random.

In one embodiment, vocal print feature to be verified and text to be verified should be extracted from the voice messaging to be verified This, including：The voice messaging to be verified is parsed, corresponding acoustic signals are obtained；By the acoustic signals framing, each frame is obtained Acoustic signals；Fourier transform is carried out to the acoustic signals of each frame, obtains corresponding frequency spectrum；Single frames is extracted from the frequency spectrum Vocal print feature；The vocal print feature of the voice messaging to be verified is generated according to the single frames vocal print feature of each frame；By the vocal print feature It is converted into text to be verified.

Wherein, acoustic signals are the information of the frequency and amplitude variation of sound wave.Acoustic signals are specifically with the frequency of sound Reflect the information that the frequency of sound changes over time using the time as abscissa for ordinate.Framing be by it is continuous several when Between point be set as a frame.Can be specifically by acoustic signals according to preset frame length by acoustic signals framing, by a complete sound Wave signal is divided into the acoustic signals that several abscissa zone sizes are frame length.

Fourier transform is that time-domain function is converted into the formula of frequency-domain function.Frequency spectrum is the letter of the frequency distribution of sound Breath.Frequency spectrum is specifically using the frequency of sound as abscissa, and the amplitude and its phase of frequency component are ordinate, and expression is one The distribution situation of the amplitude size of each frequency sine wave on quiet hour point.Fourier change is carried out to the acoustic signals of each frame It changes, obtains corresponding frequency spectrum, can be specifically that the corresponding trigonometric function of the acoustic signals of each frame is converted into each frame time Interior frequency spectrum.

In one embodiment, terminal parses voice messaging to be verified, corresponding acoustic signals is obtained, by the acoustic signals Framing, and the signal obtained after the acoustic signals after framing are multiplied with window function carries out Fourier transform, obtains corresponding frequency Spectrum.Single frames vocal print feature is extracted from frequency spectrum, and the sound of the voice messaging to be verified is generated according to the single frames vocal print feature of each frame Line feature determines the state of each frame acoustic signals, and will according to the corresponding state number of vocal print feature of each frame acoustic signals Determining state is combined, and obtains corresponding character, and text to be verified is generated according to obtained character.Window function is to sound wave The function that signal is blocked.

In the present embodiment, by the way that acoustic signals are converted into frequency spectrum, it can obtain and more believe in voice messaging to be verified Breath, to obtain more vocal print features so that speech verification is more accurate.

In one embodiment, this method further includes：Acquire current noise information；It is generated according to the noise information of acquisition Anti-interference model；After parsing obtains acoustic signals, after being corrected by the acoustic signals that the anti-interference model obtains parsing, hold Row this by the acoustic signals framing, the step of obtaining the acoustic signals of each frame.

Wherein, noise signal is the voice signal interfered to voice messaging to be verified.Noise signal can be specifically At least one of sound that ambient enviroment is sent out, such as sound of the wind, the patter of rain and reading sound etc..Anti-interference model was particularly used for Filter the model of noise signal in acoustic signals to be verified.The acoustic signals amendment for being obtained parsing by anti-interference model, tool Body can be superimposed anti-interference model with the acoustic signals that parsing obtains, and can also be to be filtered in the acoustic signals analytically obtained Remove anti-interference model.

In the present embodiment, by acquiring current noise signal, anti-interference model is generated, can be repaiied according to anti-interference model Positive acoustic signals improve the accuracy rate of voice print verification so that the acoustic signals that parsing obtains are more accurate.

In one embodiment, which includes：Obtain the time for acquiring the voice messaging to be verified Information and/or geographical location information；The default scene type that inquiry matches with the temporal information and/or geographical location information； Using the default scene type inquired as current scene type.

Wherein, temporal information is to acquire the time of voice messaging to be verified.Temporal information specifically include the date and in a few days when Between point, when time of day point includes, minute and the second.Geographical location information is the geographical location where acquisition voice messaging to be verified. Geographical location information specifically includes urban sign and building identifies, and building mark can be specifically sports ground, house, hospital, public affairs At least one of department, subway station and road etc..

In one embodiment, terminal obtains the time of day point for acquiring voice messaging to be verified, e.g. 6 points of morning It is whole, then obtain the geographical location information that terminal is currently located, e.g. Nanshan District of Shenzhen City Bay in Shenzhen park, according to the biography in terminal Sensor gets terminal in 30 minutes before getting voice messaging to be verified all in movement, and 8 kms that remain a constant speed are every Hour, then it is " jogging open air " to inquire default scene type, then " jogging open air " is used as current scene type by terminal.

In one embodiment, terminal gets the geographical location information being currently located, and e.g. at home, then directly selects The default scene type taken is " in family ", and current scene type will be used as " in family ".

In one embodiment, terminal detects WIFI (the Wireless Fidelity, based on IEEE802.11b of connection The WLAN of standard) be preset safe WIFI, then the default scene type directly chosen is " home ", and will " home " is used as current scene type.

In the present embodiment, the temporal information and/or geographical location information of voice messaging to be verified, inquiry are acquired by obtaining Matched default scene type can choose corresponding spy using the default scene type inquired as current scene type Model is levied, so that the matched scene type of voice messaging to be verified is consistent with the matched scene type of characteristic model, to Reduce image of the scene to voice messaging to be verified as far as possible, and then improves the return rate of speech verification.

In one embodiment, which includes：Obtain the time for acquiring the voice messaging to be verified Information and geographical location information；Search the Weather information to match with the temporal information and the geographical location information；It inquires and is somebody's turn to do The default scene type that Weather information matches；Using the default scene type inquired as current scene type.

Wherein, Weather information is the information of weather phenomenon in an area.Weather information specifically includes temperature, air pressure, wet Degree, wind, cloud, mist, rain, sudden strain of a muscle, snow, frost, thunder, hail, haze etc..

In one embodiment, the date of terminal acquisition acquisition voice messaging to be verified and time of day point, e.g. 12 3 o'clock sharp of 18 afternoon moon, then the geographical location information that terminal is currently located is obtained, for example (,) Enterprises of Futian District safety mansion, root Current Weather information, such as cloudy, current temperature are inquired in weather forecast system according to the date and geographical location information of acquisition 12 degrees Celsius, 5 grades of northeaster are spent, and comparison 3 o'clock sharp of afternoon December 17 cools down 5 degrees Celsius, then the default scene inquired Type is " easy catching a cold "." easy catching a cold " that inquires is used as current scene type.

In the present embodiment, the temporal information and/or geographical location information of voice messaging to be verified, inquiry are acquired by obtaining Matched Weather information, and inquire with the matched default scene type of Weather information, using the default scene type inquired as Current scene type can choose corresponding characteristic model so that the matched scene type of voice messaging to be verified and The matched scene type of characteristic model is consistent, to reduce image of the scene to voice messaging to be verified as far as possible, and then improves The return rate of speech verification.

In one embodiment, this method further includes：Obtain public characteristic model；It obtains and default scene type and the use Family identifies corresponding trained speech samples；The public characteristic model is subjected to retraining according to the training speech samples, is obtained The characteristic model that scene type and the user identifier match is preset with this.

Wherein, public characteristic model is general characteristic model.Public characteristic model is specifically same type of sound General characteristic model, such as male voice, child's voice or female voice etc..Training speech samples are the voices of training characteristics model acquisition Information.Specifically, the period of training speech samples is acquired between choosing public characteristic model the latter moon to three months, specifically Time depends on the frequency of acquisition training speech samples.

In one embodiment, server is chosen and the GMM-UBM (Gaussian of the voice print matching of user in model library Markov Model-Uniform Background Model, gauss hybrid models-universal background model), within training period By the training speech samples of acquisition, GMM-UBM is constantly trained, GMM-UBM is trained to the user identifier phase with user The characteristic model matched.When server trains GMM-UBM, detect that the vocal print feature of trained speech samples is received with other time The vocal print feature collected changes greatly, then obtains the scene informations such as geographical location information, temporal information and the Weather information of terminal, The scene information got is identified as scene type.

In the present embodiment, retraining is carried out to public characteristic model by using training speech samples, can rapidly be instructed Practise characteristic model so that efficiency is got higher.

As shown in figure 3, in one embodiment, additionally providing a kind of speech verification method, this method specifically includes following The step of：

S302, terminal obtain authentication instruction.

S304, terminal are instructed in response to the authentication, obtain user identifier.

S306, terminal inquiry correspond to the text that the user identifier is pre-configured.

S308 generates text at random when terminal does not inquire the text.

S310, terminal feed back the text generated at random.

S312, terminal acquire current noise information.

The voice messaging to be verified that the text of S314, terminal acquisition and feedback matches.

The noise information of acquisition and voice messaging to be verified are fed back to server by S316, terminal.

S318, server generate anti-interference model according to noise information.

S320, server parse the voice messaging to be verified, obtain corresponding acoustic signals.

S322, server will parse obtained acoustic signals after parsing obtains acoustic signals, by the anti-interference model After amendment.

The acoustic signals framing is obtained the acoustic signals of each frame by S324, server.

S326, server carry out Fourier transform to the acoustic signals of each frame, obtain corresponding frequency spectrum.

S328, server extract single frames vocal print feature from the frequency spectrum.

S330, server generate the vocal print feature of the voice messaging to be verified according to the single frames vocal print feature of each frame.

S332, server convert the vocal print feature to text to be verified.

S334, terminal obtain the temporal information and geographical location information for acquiring the voice messaging to be verified.

S336, after temporal information and geographical location information are fed back to server by terminal, whois lookup is believed with the time The Weather information that breath and the geographical location information match.

S338, the default scene type that server inquiry matches with the Weather information.

S340, server is using the default scene type inquired as current scene type.

S342, server inquire and the current scene type matching and characteristic model corresponding with the user identifier.

The text conversion to be verified is with reference to vocal print feature by this feature model by S344, server.

S346, server compares the vocal print feature to be verified and this refers to vocal print feature, obtains speech verification result.

Above-mentioned speech verification method, after obtaining voice messaging to be verified and corresponding user identifier, from voice to be verified Vocal print feature and text to be verified are extracted in information.By obtaining current scene type, inquiry and current scene type matching, And characteristic model corresponding with user identifier, since voice messaging to be verified is obtained under the current corresponding scene of scene type Get, therefore voice messaging to be verified and current scene type matching, vocal print feature to be verified also with current scene type Matching.By characteristic model by text conversion to be verified be with reference to vocal print feature, this with reference to vocal print feature naturally also with work as front court Scape type matching.With reference to vocal print feature and vocal print feature to be verified all with current scene type matching when, by comparing should Vocal print feature to be verified refers to vocal print feature with this, and obtained speech verification result can accurately reflect voice to be verified Information whether be user voice messaging, so as to when user voice changes, also can recognize that user's sheet The sound of people.And when being verified, marked using vocal print feature pair to be verified and current scene type matching and with user Know corresponding characteristic model retraining and update, the validity of characteristic model corresponding with this scene type can also be improved, And then improve the recall rate of speech verification.

It should be understood that although each step in the flow chart of Fig. 3 is shown successively according to the instruction of arrow, this A little steps are not that the inevitable sequence indicated according to arrow executes successively.Unless expressly state otherwise herein, these steps It executes there is no the limitation of stringent sequence, these steps can execute in other order.Moreover, at least part in Fig. 3 Step may include that either these sub-steps of multiple stages or stage are executed in synchronization to multiple sub-steps It completes, but can execute at different times, the execution sequence in these sub-steps or stage is also not necessarily to be carried out successively, But it can either the sub-step of other steps or at least part in stage execute in turn or alternately with other steps.

In one embodiment, as shown in figure 4, providing a kind of speech verification device 400, including：Data obtaining module 402, information extraction modules 404, type acquisition module 406, pattern query module 408, Feature Conversion module 410, feature compare Module 412, retraining module 413 and model modification module 415, wherein：Data obtaining module 402, for obtaining language to be verified Message ceases and corresponding user identifier；Information extraction modules 404, for extracting sound to be verified from the voice messaging to be verified Line feature and text to be verified；Type acquisition module 406, for obtaining current scene type；Pattern query module 408, is used for Inquire and the current scene type matching and characteristic model corresponding with the user identifier；Feature Conversion module 410, for leading to This feature model is crossed, is with reference to vocal print feature by the text conversion to be verified；Feature comparison module 412, it is to be tested for comparing this The vocal print feature of card refers to vocal print feature with this, obtains speech verification result；Retraining module 413, for working as the speech verification When as a result indicating to be verified, then retraining is carried out to this feature model according to the vocal print feature to be verified；Model modification mould Block 415, for using after retraining characteristic model update and the current scene type matching and corresponding with the user identifier Characteristic model.

Above-mentioned speech verification device 400, after obtaining voice messaging to be verified and corresponding user identifier, from language to be verified Extraction vocal print feature and text to be verified in message breath.By obtaining current scene type, inquiry and current scene type Match and characteristic model corresponding with user identifier, since voice messaging to be verified is in the current corresponding scene of scene type Under get, therefore voice messaging to be verified and current scene type matching, vocal print feature to be verified also with current scene Type matching.By characteristic model by text conversion to be verified be with reference to vocal print feature, this with reference to vocal print feature naturally also with work as Preceding scene type matching.With reference to vocal print feature and vocal print feature to be verified all with current scene type matching when, pass through ratio Vocal print feature is referred to compared with the vocal print feature to be verified and this, obtained speech verification result can accurately reflect to be verified Voice messaging whether be user voice messaging, so as to when user voice changes, also can recognize that use The sound at family.And when being verified, using vocal print feature pair to be verified and current scene type matching and with Family identifies corresponding characteristic model retraining and updates, and can also improve the effective of characteristic model corresponding with this scene type Property, and then improve the recall rate of speech verification.

As shown in figure 5, in one embodiment, data obtaining module 402, including：Instruction acquisition module 402a obtains body Part verification instruction；Identifier acquisition module 402b is instructed in response to the authentication, obtains user identifier；Text query module 402c, for inquiring the text being pre-configured corresponding to the user identifier；Text generation module 402d, for this article ought not inquired This when, generates text at random；Text feedback module 402e, for feeding back the text generated at random；Information acquisition module 402f, The voice messaging to be verified that the text for acquiring and feeding back matches.

As shown in fig. 6, in one embodiment, information extraction modules 404, including：Information analysis module 404a, for solving The voice messaging to be verified is analysed, corresponding acoustic signals are obtained；Signal framing module 404b is used for the acoustic signals framing, Obtain the acoustic signals of each frame；Signal conversion module 404c carries out Fourier transform for the acoustic signals to each frame, Obtain corresponding frequency spectrum；Characteristic extracting module 404d, for extracting single frames vocal print feature from the frequency spectrum；Feature generation module 404e, the vocal print feature for generating the voice messaging to be verified according to the single frames vocal print feature of each frame；Text conversion module 404f, for converting the vocal print feature to text to be verified.

In one embodiment, data obtaining module 402 are additionally operable to acquire current noise information；Information extraction modules 404, it is additionally operable to generate anti-interference model according to the noise information of acquisition；It is anti-interference by this after parsing obtains acoustic signals After the acoustic signals that model obtains parsing are corrected, this is executed by the acoustic signals framing, obtains the acoustic signals of each frame Step.

As shown in fig. 7, in one embodiment, type acquisition module 406, including：Scene acquisition module 406a, for obtaining Take the temporal information and/or geographical location information for acquiring the voice messaging to be verified；Type queries module 406b, for inquire with The default scene type that the temporal information and/or geographical location information match；Determination type module 406c, for that will inquire Default scene type as current scene type.

In one embodiment, scene acquisition module 406a is additionally operable to obtain the time for acquiring the voice messaging to be verified Information and geographical location information；The above-mentioned type acquisition module 406 further includes：Weather acquisition module 406d, when for searching with this Between the Weather information that matches of information and the geographical location information；Type queries module 406b is additionally operable to inquiry and believes with the weather The matched default scene type of manner of breathing；Determination type module 406c is additionally operable to the default scene type that will be inquired as current Scene type.

As shown in figure 8, in one embodiment, above-mentioned speech verification device 400 further includes：Model acquisition module 414, For obtaining public characteristic model；Sample acquisition module 416, it is corresponding with default scene type and the user identifier for obtaining Training speech samples；Model training module 418, for being instructed the public characteristic model again according to the training speech samples Practice, obtains presetting the characteristic model that scene type and the user identifier match with this.

Specific about speech verification device limits the restriction that may refer to above for speech verification method, herein not It repeats again.Modules in above-mentioned speech verification device can be realized fully or partially through software, hardware and combinations thereof.On Stating each module can be embedded in or independently of in the processor in computer equipment, can also store in a software form in the form of hardware In memory in computer equipment, the corresponding operation of the above modules is executed in order to which processor calls.

In one embodiment, a kind of computer equipment is provided, which can be terminal, internal structure Figure can be as shown in Figure 9.The computer equipment includes the processor connected by system bus, memory, network interface, display Screen and input unit.Wherein, the processor of the computer equipment is for providing calculating and control ability.The computer equipment is deposited Reservoir includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system and computer journey Sequence.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The network interface of machine equipment is used to communicate by network connection with external terminal.When the computer program is executed by processor with Realize a kind of speech verification method.The display screen of the computer equipment can be liquid crystal display or electric ink display screen, The input unit of the computer equipment can be the touch layer covered on display screen, can also be to be arranged on computer equipment shell Button, trace ball or Trackpad, can also be external keyboard, Trackpad or mouse etc..

It will be understood by those skilled in the art that structure shown in Fig. 9, is only tied with the relevant part of application scheme The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment May include either combining certain components than more or fewer components as shown in the figure or being arranged with different components.

In one embodiment, a kind of computer equipment, including memory and processor are provided, which is stored with Computer program, processor realize following steps when executing computer program：Obtain voice messaging to be verified and corresponding user Mark；Vocal print feature to be verified and text to be verified are extracted from the voice messaging to be verified；Obtain current scene type；It looks into Ask and the current scene type matching and characteristic model corresponding with the user identifier；It is by this feature model, this is to be verified Text conversion is with reference to vocal print feature；Compare the vocal print feature to be verified and this refers to vocal print feature, obtains speech verification knot Fruit；When speech verification result expression is verified, then this feature model is carried out again according to the vocal print feature to be verified Training；And the current scene type matching and feature corresponding with the user identifier is updated using the characteristic model after retraining Model.

Above computer equipment is believed after obtaining voice messaging to be verified and corresponding user identifier from voice to be verified Vocal print feature and text to be verified are extracted in breath.By obtaining current scene type, inquiry and current scene type matching and Characteristic model corresponding with user identifier, since voice messaging to be verified is obtained under the current corresponding scene of scene type Arrive, therefore voice messaging to be verified and current scene type matching, vocal print feature to be verified also with current scene type Match.By characteristic model by text conversion to be verified be with reference to vocal print feature, this with reference to vocal print feature naturally also with current scene Type matching.With reference to vocal print feature and vocal print feature to be verified all with current scene type matching when, waited for by comparing this The vocal print feature of verification refers to vocal print feature with this, and obtained speech verification result can accurately reflect voice letter to be verified Breath whether be user voice messaging, so as to when user voice changes, also can recognize that user Sound.And when being verified, vocal print feature pair to be verified and current scene type matching and and user identifier are used Corresponding characteristic model retraining simultaneously updates, and can also improve the validity of characteristic model corresponding with this scene type, into And improve the recall rate of speech verification.

In one embodiment, following steps are also realized when processor executes computer program：Obtain authentication instruction； It is instructed in response to the authentication, obtains user identifier；Inquiry corresponds to the text that the user identifier is pre-configured；When not inquiring When the text, text is generated at random；Feed back the text generated at random；The language to be verified that the text of acquisition and feedback matches Message ceases.

In one embodiment, following steps are also realized when processor executes computer program：Parse the voice to be verified Information obtains corresponding acoustic signals；By the acoustic signals framing, the acoustic signals of each frame are obtained；To the sound of each frame Wave signal carries out Fourier transform, obtains corresponding frequency spectrum；Single frames vocal print feature is extracted from the frequency spectrum；According to the list of each frame Frame vocal print feature generates the vocal print feature of the voice messaging to be verified；Convert the vocal print feature to text to be verified.

In one embodiment, following steps are also realized when processor executes computer program：Acquire current noise letter Breath；Anti-interference model is generated according to the noise information of acquisition；After parsing obtains acoustic signals, it will be solved by the anti-interference model The step of analysing after obtained acoustic signals correct, executing this by the acoustic signals framing, obtain the acoustic signals of each frame.

In one embodiment, following steps are also realized when processor executes computer program：Obtaining acquisition, this is to be verified The temporal information and/or geographical location information of voice messaging；Inquiry matches with the temporal information and/or geographical location information Default scene type；Using the default scene type inquired as current scene type.

In one embodiment, following steps are also realized when processor executes computer program：Obtaining acquisition, this is to be verified The temporal information and geographical location information of voice messaging；Search the weather to match with the temporal information and the geographical location information Information；The default scene type that inquiry matches with the Weather information；Using the default scene type inquired as current scene Type.

In one embodiment, following steps are also realized when processor executes computer program：Obtain public characteristic model； Obtain trained speech samples corresponding with default scene type and the user identifier；It is according to the training speech samples that this is public Characteristic model carries out retraining, obtains presetting the characteristic model that scene type and the user identifier match with this.

In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program realizes following steps when being executed by processor：Obtain voice messaging to be verified and corresponding user identifier；It is to be tested from this Vocal print feature to be verified and text to be verified are extracted in card voice messaging；Obtain current scene type；Inquire and deserve front court Scape type matching and characteristic model corresponding with the user identifier；It is ginseng by the text conversion to be verified by this feature model Examine vocal print feature；Compare the vocal print feature to be verified and this refers to vocal print feature, obtains speech verification result；When the voice is tested When card result expression is verified, then retraining is carried out to this feature model according to the vocal print feature to be verified；Using instructing again Characteristic model after white silk updates and the current scene type matching and characteristic model corresponding with the user identifier.

Above computer readable storage medium storing program for executing, after obtaining voice messaging to be verified and corresponding user identifier, to be tested Demonstrate,prove extraction vocal print feature and text to be verified in voice messaging.By obtaining current scene type, inquiry and current scene class Type matches and characteristic model corresponding with user identifier, since voice messaging to be verified is corresponding in current scene type Got under scene, therefore voice messaging to be verified and current scene type matching, vocal print feature to be verified also with currently Scene type matches.By characteristic model by text conversion to be verified for reference to vocal print feature, this refers to vocal print feature naturally also With current scene type matching.With reference to vocal print feature and vocal print feature to be verified all with current scene type matching when, lead to Cross compare the vocal print feature to be verified and this refer to vocal print feature, obtained speech verification result, which can accurately reflect, to be waited for Verification voice messaging whether be user voice messaging, so as to when user voice changes, can also identify Go out the sound of user.And when being verified, using vocal print feature pair to be verified and current scene type matching and Characteristic model retraining corresponding with user identifier simultaneously updates, and can also improve characteristic model corresponding with this scene type Validity, and then improve the recall rate of speech verification.

One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, Any reference to memory, storage, database or other media used in each embodiment provided herein, Including nonvolatile memory.Nonvolatile memory may include that read-only memory (ROM), programming ROM (PROM), electricity can be compiled Journey ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.

Each technical characteristic of above example can be combined arbitrarily, to keep description succinct, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield is all considered to be the range of this specification record.

The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the protection of the application Range.Therefore, the protection domain of the application patent should be determined by the appended claims.

Claims

1. a kind of speech verification method, the method includes：

Obtain voice messaging to be verified and corresponding user identifier；

Obtain current scene type；

It is with reference to vocal print feature by the text conversion to be verified by the characteristic model；

Compare the vocal print feature to be verified and described with reference to vocal print feature, obtains speech verification result；

When speech verification result expression is verified, then according to the vocal print feature to be verified to the characteristic model Carry out retraining；

2. according to the method described in claim 1, it is characterized in that, acquisition voice messaging to be verified and corresponding user mark Know, including：

Obtain authentication instruction；

It is instructed in response to the authentication, obtains user identifier；

Inquiry corresponds to the text that the user identifier is pre-configured；

When not inquiring the text, text is generated at random；

Feed back the text generated at random；

3. according to the method described in claim 1, it is characterized in that, it is described extracted from the voice messaging to be verified it is to be verified Vocal print feature and text to be verified, including：

Single frames vocal print feature is extracted from the frequency spectrum；

Convert the vocal print feature to text to be verified.

4. according to the method described in claim 3, it is characterized in that, the method further includes：

Acquire current noise information；

After parsing obtains acoustic signals, after being corrected by the acoustic signals that the anti-interference model obtains parsing, institute is executed The step of stating the acoustic signals framing, obtaining the acoustic signals of each frame.

5. according to the method described in claim 1, it is characterized in that, the acquisition current scene type includes：

Using the default scene type inquired as current scene type.

6. according to the method described in claim 1, it is characterized in that, the acquisition current scene type includes：

The default scene type that inquiry matches with the Weather information；

Using the default scene type inquired as current scene type.

7. method according to any one of claim 1 to 6, which is characterized in that further include：

Obtain public characteristic model；

According to the trained speech samples by the public characteristic model carry out retraining, obtain with the default scene type and The characteristic model that the user identifier matches.

8. a kind of speech verification device, which is characterized in that described device includes：

Information extraction modules, for extracting vocal print feature to be verified and text to be verified from the voice messaging to be verified；

Type acquisition module, for obtaining current scene type；

Feature Conversion module, for being with reference to vocal print feature by the text conversion to be verified by the characteristic model；

Feature comparison module is used for the vocal print feature to be verified and described with reference to vocal print feature, obtains speech verification As a result

Retraining module, it is for when speech verification result expression is verified, then special according to the vocal print to be verified Sign carries out retraining to the characteristic model；

Model modification module, for using the characteristic model update after retraining with the current scene type matching and with institute State the corresponding characteristic model of user identifier.

9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In when the processor executes the computer program the step of any one of realization claim 1 to 7 the method.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claim 1 to 7 is realized when being executed by processor.