CN108305633A - Speech verification method, apparatus, computer equipment and computer readable storage medium - Google Patents
Speech verification method, apparatus, computer equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN108305633A CN108305633A CN201810041764.3A CN201810041764A CN108305633A CN 108305633 A CN108305633 A CN 108305633A CN 201810041764 A CN201810041764 A CN 201810041764A CN 108305633 A CN108305633 A CN 108305633A
- Authority
- CN
- China
- Prior art keywords
- verified
- vocal print
- scene type
- print feature
- voice messaging
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/08—Use of distortion metrics or a particular distance between probe pattern and reference templates
-
- G10L21/0202—
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/12—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02087—Noise filtering the noise being separate speech, e.g. cocktail party
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
Abstract
This application involves a kind of auth method, system, computer equipment and storage mediums.The method includes:Obtain voice messaging to be verified and corresponding user identifier;Vocal print feature to be verified and text to be verified are extracted from the voice messaging to be verified;Obtain current scene type;Inquire and the current scene type matching and characteristic model corresponding with the user identifier;It is with reference to vocal print feature by the text conversion to be verified by this feature model;Compare the vocal print feature to be verified and this refers to vocal print feature, obtains speech verification result;When speech verification result expression is verified, then retraining is carried out to this feature model according to the vocal print feature to be verified;And the current scene type matching and characteristic model corresponding with the user identifier is updated using the characteristic model after retraining.The sound of user can also can recognize that, and then improve the recall rate of speech verification when user voice changes using this method.
Description
Technical field
This application involves technical field of voice recognition, more particularly to a kind of speech verification method, apparatus, computer equipment
And computer readable storage medium.
Background technology
It, can be accurate by identifying the biological characteristic of user since the biological characteristic of each two people is different from
Confirm the identity of user.The sensor that the biological characteristic of identification human body needs precision high, and the sensor that these precision are high
In general volume is all larger.
Currently, as the technology of sensor element obtains promotion at full speed, precision, volume and the price of sensor element are all
Significant progress has been obtained, therefore can also realize verify the side of user identity by identifying biological characteristic on mobile terminals
Method.And identify a kind of verification method that the vocal print of user is exactly relatively conventional in traditional technology.
However, the speech verification method in traditional technology can only ensure in the case that the sound of user is constant verification at
Work(can all make traditional speech verification method fail, the recall rate of verification is very low when user voice changes.
Invention content
Based on this, it is necessary in view of the above technical problems, provide it is a kind of can be in the voice of good authentication under different scenes
Verification method, device, computer equipment and storage medium.
A kind of speech verification method, including:
Obtain voice messaging to be verified and corresponding user identifier;
Vocal print feature to be verified and text to be verified are extracted from the voice messaging to be verified;
Obtain current scene type;
Inquire and the current scene type matching and characteristic model corresponding with the user identifier;
It is with reference to vocal print feature by the text conversion to be verified by this feature model;
Compare the vocal print feature to be verified and this refers to vocal print feature, obtains speech verification result;
When the speech verification result expression be verified when, then according to the vocal print feature to be verified to this feature model into
Row retraining;
And the current scene type matching and spy corresponding with the user identifier is updated using the characteristic model after retraining
Levy model.
In one embodiment, acquisition voice messaging to be verified and corresponding user identifier, including:
Obtain authentication instruction;
It is instructed in response to the authentication, obtains user identifier;
Inquiry corresponds to the text that the user identifier is pre-configured;
When not inquiring the text, text is generated at random;
Feed back the text generated at random;
The voice messaging to be verified that the text of acquisition and feedback matches.
In one embodiment, vocal print feature to be verified and text to be verified should be extracted from the voice messaging to be verified
This, including:
The voice messaging to be verified is parsed, corresponding acoustic signals are obtained;
By the acoustic signals framing, the acoustic signals of each frame are obtained;
Fourier transform is carried out to the acoustic signals of each frame, obtains corresponding frequency spectrum;
Single frames vocal print feature is extracted from the frequency spectrum;
The vocal print feature of the voice messaging to be verified is generated according to the single frames vocal print feature of each frame;
Convert the vocal print feature to text to be verified.
In one embodiment, this method further includes:
Acquire current noise information;
Anti-interference model is generated according to the noise information of acquisition;
After parsing obtains acoustic signals, after being corrected by the acoustic signals that the anti-interference model obtains parsing, execute
This is by the acoustic signals framing, the step of obtaining the acoustic signals of each frame.
In one embodiment, which includes:
Obtain the temporal information and/or geographical location information for acquiring the voice messaging to be verified;
The default scene type that inquiry matches with the temporal information and/or geographical location information;
Using the default scene type inquired as current scene type.
In one embodiment, which includes:
Obtain the temporal information and geographical location information for acquiring the voice messaging to be verified;
Search the Weather information to match with the temporal information and the geographical location information;
The default scene type that inquiry matches with the Weather information;
Using the default scene type inquired as current scene type.
In one embodiment, this method further includes:
Obtain public characteristic model;
Obtain trained speech samples corresponding with default scene type and the user identifier;
The public characteristic model is subjected to retraining according to the training speech samples, obtain presetting scene type with this and is somebody's turn to do
The characteristic model that user identifier matches.
A kind of speech verification device, the device include:
Data obtaining module, for obtaining voice messaging to be verified and corresponding user identifier;
Information extraction modules, for extracting vocal print feature to be verified and text to be verified from the voice messaging to be verified
This;
Type acquisition module, for obtaining current scene type;
Pattern query module, for inquiring and the current scene type matching and character modules corresponding with the user identifier
Type;
Feature Conversion module, for being with reference to vocal print feature by the text conversion to be verified by this feature model;
Feature comparison module refers to vocal print feature with this for comparing the vocal print feature to be verified, obtains speech verification
As a result
Retraining module, for when the verification result expression be verified when, then according to the vocal print feature pair to be verified
This feature model carries out retraining;
Model modification module, for using the characteristic model update after retraining with the current scene type matching and with
The corresponding characteristic model of the user identifier.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, which holds
The step of as above any one method is realized when the row computer program.
A kind of computer readable storage medium, is stored thereon with computer program, which is executed by processor
Shi Shixian as above any one of method the step of.
Above-mentioned speech verification method, apparatus, computer equipment and computer readable storage medium are obtaining voice to be verified
After information and corresponding user identifier, vocal print feature and text to be verified are extracted from voice messaging to be verified.Worked as by obtaining
Preceding scene type inquires and current scene type matching and characteristic model corresponding with user identifier, due to voice to be verified
Information is got under the current corresponding scene of scene type, therefore voice messaging to be verified and current scene type
Match, vocal print feature to be verified also with current scene type matching.By characteristic model by text conversion to be verified for reference to sound
Line feature, this with reference to vocal print feature naturally also with current scene type matching.Special with reference to vocal print feature and vocal print to be verified
When levying all with current scene type matching, vocal print feature, obtained language are referred to by comparing the vocal print feature to be verified and this
Sound verification result can accurately reflect voice messaging to be verified whether be user voice messaging, so as to
When family sound changes, the sound of user also can recognize that.And when being verified, vocal print to be verified is used
Feature pair and current scene type matching and characteristic model retraining corresponding with user identifier simultaneously update, can also improve with
The validity of the corresponding characteristic model of this scene type, and then improve the recall rate of speech verification.
Description of the drawings
Fig. 1 is the application scenario diagram of speech verification method in one embodiment;
Fig. 2 is the flow diagram of speech verification method in one embodiment;
Fig. 3 is the flow diagram of speech verification method in another embodiment;
Fig. 4 is the structure diagram of speech verification device in one embodiment;
Fig. 5 is the structure diagram of speech verification device in another embodiment;
Fig. 6 is the structure diagram of speech verification device in one embodiment;
Fig. 7 is the structure diagram of speech verification device in another embodiment;
Fig. 8 is the structure diagram of speech verification device in one embodiment;
Fig. 9 is the internal structure chart of one embodiment Computer equipment.
Specific implementation mode
It is with reference to the accompanying drawings and embodiments, right in order to make the object, technical solution and advantage of the application be more clearly understood
The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not
For limiting the application.
Speech verification method provided by the present application, can be applied in application environment as shown in Figure 1.Wherein, terminal 110
It is communicated by network with server 120 by network, user 100 passes through input unit operating terminal 110.Wherein, terminal
110 can be, but not limited to be various personal computers, laptop, smart mobile phone, tablet computer and portable wearable set
Standby, server 120 can be realized with the server cluster of the either multiple server compositions of independent server.
In one embodiment, as shown in Fig. 2, providing a kind of speech verification method, it is applied in Fig. 1 in this way
It is illustrated for terminal, but this method is not limited to only implement in terminal, specifically includes following steps:
S202 obtains voice messaging to be verified and corresponding user identifier.
Wherein, voice messaging to be verified is the voice messaging being verified in speech verification.User identifier is user identity
Mark.
In one embodiment, after terminal collects voice messaging to be verified, which is sent to clothes
Business device.After server receives voice messaging to be verified, user corresponding with the terminal of the voice messaging to be verified is sent is chosen
Mark.
S204 extracts vocal print feature to be verified and text to be verified from the voice messaging to be verified.
Wherein, vocal print feature is the characteristic information of vocal print.Vocal print is the sound wave spectrum of voice messaging.It is characterized in describing object
The information of shared characteristic, object can be vocal prints.Feature can be specifically MFCC (MelFrequency Cepstrum
Coefficient, mel-frequency cepstrum coefficient) feature, (perceptual linearprediction, perception are linear pre- by PLP
At least one of survey) feature and LPC (Linear Predictive Coding, linear prediction analysis) etc., can also be frequency
At least one of spectrum, nasal sound, pronunciation and word speed etc..Vocal print feature to be verified is the vocal print spy being verified in speech verification
Sign.Text to be verified is the text message being verified in speech verification.Text to be verified is specifically voice messaging to be verified with text
The information that this form is recorded.
In one embodiment, server extracts vocal print feature to be verified and text to be verified from voice messaging to be verified
This, corresponding terminal is returned by the vocal print feature to be verified extracted and text feedback to be verified.
S206 obtains current scene type.
Wherein, scene type is the type of scene.Scene be specifically place when obtaining voice messaging to be verified, the time,
The combination of weather and environment etc..Current scene type is specifically the type of scene when obtaining voice messaging to be verified.
In one embodiment, terminal obtains location information and temporal information when acquiring voice messaging to be verified, will obtain
The location information and temporal information taken is sent to server.Server obtains phase according to the location information and temporal information received
The Weather information and environmental information answered, and determine that terminal is worked as according to the location information, temporal information, Weather information and environmental information
Preceding scene type.
S208 inquires and the current scene type matching and characteristic model corresponding with the user identifier.
Wherein, characteristic model can be specifically the set of the vocal print feature of individual subscriber, and characteristic model can be used for simulating
The vocal print feature of user.
In one embodiment, the corresponding current scene type of terminal and user identifier are being fed back to server by terminal
Afterwards, server inquires and current scene type matching and characteristic model corresponding with user identifier from database.
The text conversion to be verified is with reference to vocal print feature by this feature model by S210.
Wherein, when being speech verification with reference to vocal print feature vocal print feature to be verified reference object.
In one embodiment, text to be verified is converted into voice messaging by server by characteristic model, and from conversion
Extraction refers to vocal print feature in obtained voice messaging.
S212, compares the vocal print feature to be verified and this refers to vocal print feature, obtains speech verification result.
In one embodiment, server vocal print feature more to be verified and with reference to the language that after vocal print feature, will be obtained
Sound verification result feeds back to terminal.If speech verification result expression is verified, terminal is according to the speech verification result
Corresponding application program is unlocked.If the speech verification result indicate verification not by when, terminal reacquires language to be verified
Message ceases.
S214, when speech verification result expression is verified, then according to the vocal print feature to be verified to this feature
Model carries out retraining.
Wherein, retraining is carried out to characteristic model according to vocal print feature to be verified, can is specifically that comparison is to be verified
The high vocal print feature of the frequency of occurrences in vocal print feature to be verified is added in characteristic model for vocal print feature and characteristic model.
In one embodiment, server is when detecting that the expression of speech verification result is verified, then to be verified
The vocal print feature that the frequency of occurrences is higher than predetermined threshold value is chosen in vocal print feature, and the vocal print feature selected and characteristic model are carried out
Comparison, if the vocal print feature selected vocal print feature difference corresponding with characteristic model is less than preset value, by what is selected
Vocal print feature is added in characteristic model.
S216 is updated and the current scene type and the matched feature of the user identifier using the characteristic model after retraining
Model.
In the present embodiment, after obtaining voice messaging to be verified and corresponding user identifier, from voice messaging to be verified
Extract vocal print feature and text to be verified.By obtaining current scene type, inquiry with current scene type matching and with
Family identifies corresponding characteristic model, since voice messaging to be verified is got under the current corresponding scene of scene type
, therefore voice messaging to be verified and current scene type matching, vocal print feature to be verified also with current scene type matching.
By characteristic model by text conversion to be verified be with reference to vocal print feature, this with reference to vocal print feature naturally also with current scene type
Matching.With reference to vocal print feature and vocal print feature to be verified all with current scene type matching when, it is to be verified by comparing this
Vocal print feature and this refer to vocal print feature, obtained speech verification result can accurately reflect that voice messaging to be verified is
No is the voice messaging of user, so as to when user voice changes, also can recognize that the sound of user
Sound.It is with current scene type matching and corresponding with user identifier using vocal print feature pair to be verified and when being verified
Characteristic model retraining and update, the validity of characteristic model corresponding with this scene type, Jin Erti can also be improved
The recall rate of high speech verification.
In one embodiment, acquisition voice messaging to be verified and corresponding user identifier, including:Obtain authentication
Instruction;It is instructed in response to the authentication, obtains user identifier;Inquiry corresponds to the text that the user identifier is pre-configured;When not
When inquiring the text, text is generated at random;Feed back the text generated at random;It acquires and is waited for what the text of feedback matched
Verify voice messaging.
Wherein, authentication instruction is to activate the instruction of speech verification.The text of pre-configuration is particularly used for certification user
The corresponding text message of voice messaging of identity.It is random to generate text, can be specifically that text is randomly selected in text list
Information can also be to generate text message at random according to dictionary.
In one embodiment, terminal obtains user and is instructed by the authentication that touch screen triggers, in response to the identity
Verification instruction, obtains corresponding user identifier in the database, and after obtaining user identifier, inquiry is corresponding to the user identifier
The text of pre-configuration.When inquiring the text of pre-configuration, the mark of voice messaging is being acquired in the display screen display of terminal
Know.When not inquiring the text of pre-configuration, text is generated according to dictionary at random, the text generated at random is shown on a display screen
Show, and acquires voice messaging to be verified.
In one embodiment, terminal obtains user and is instructed by the authentication that touch screen triggers, by the authentication
Instruction feedback value server.Server obtains corresponding user identifier in the database, and inquires corresponding to the user identifier
The text of pre-configuration.When inquiring the text of pre-configuration, start the instruction for acquiring voice messaging to be verified to terminal feedback.When
When not inquiring the text of pre-configuration, text is generated according to dictionary at random, the text generated at random is sent to terminal.
In the present embodiment, by obtaining user identifier, text of the inquiry corresponding to the pre-configuration of user identifier.If inquiry
To the text of pre-configuration, so that it may directly to acquire voice messaging to be verified so that speech verification is prompt quickly.If do not inquired pre-
The text of configuration then generates text, can also improve safety at random.
In one embodiment, vocal print feature to be verified and text to be verified should be extracted from the voice messaging to be verified
This, including:The voice messaging to be verified is parsed, corresponding acoustic signals are obtained;By the acoustic signals framing, each frame is obtained
Acoustic signals;Fourier transform is carried out to the acoustic signals of each frame, obtains corresponding frequency spectrum;Single frames is extracted from the frequency spectrum
Vocal print feature;The vocal print feature of the voice messaging to be verified is generated according to the single frames vocal print feature of each frame;By the vocal print feature
It is converted into text to be verified.
Wherein, acoustic signals are the information of the frequency and amplitude variation of sound wave.Acoustic signals are specifically with the frequency of sound
Reflect the information that the frequency of sound changes over time using the time as abscissa for ordinate.Framing be by it is continuous several when
Between point be set as a frame.Can be specifically by acoustic signals according to preset frame length by acoustic signals framing, by a complete sound
Wave signal is divided into the acoustic signals that several abscissa zone sizes are frame length.
Fourier transform is that time-domain function is converted into the formula of frequency-domain function.Frequency spectrum is the letter of the frequency distribution of sound
Breath.Frequency spectrum is specifically using the frequency of sound as abscissa, and the amplitude and its phase of frequency component are ordinate, and expression is one
The distribution situation of the amplitude size of each frequency sine wave on quiet hour point.Fourier change is carried out to the acoustic signals of each frame
It changes, obtains corresponding frequency spectrum, can be specifically that the corresponding trigonometric function of the acoustic signals of each frame is converted into each frame time
Interior frequency spectrum.
In one embodiment, terminal parses voice messaging to be verified, corresponding acoustic signals is obtained, by the acoustic signals
Framing, and the signal obtained after the acoustic signals after framing are multiplied with window function carries out Fourier transform, obtains corresponding frequency
Spectrum.Single frames vocal print feature is extracted from frequency spectrum, and the sound of the voice messaging to be verified is generated according to the single frames vocal print feature of each frame
Line feature determines the state of each frame acoustic signals, and will according to the corresponding state number of vocal print feature of each frame acoustic signals
Determining state is combined, and obtains corresponding character, and text to be verified is generated according to obtained character.Window function is to sound wave
The function that signal is blocked.
In the present embodiment, by the way that acoustic signals are converted into frequency spectrum, it can obtain and more believe in voice messaging to be verified
Breath, to obtain more vocal print features so that speech verification is more accurate.
In one embodiment, this method further includes:Acquire current noise information;It is generated according to the noise information of acquisition
Anti-interference model;After parsing obtains acoustic signals, after being corrected by the acoustic signals that the anti-interference model obtains parsing, hold
Row this by the acoustic signals framing, the step of obtaining the acoustic signals of each frame.
Wherein, noise signal is the voice signal interfered to voice messaging to be verified.Noise signal can be specifically
At least one of sound that ambient enviroment is sent out, such as sound of the wind, the patter of rain and reading sound etc..Anti-interference model was particularly used for
Filter the model of noise signal in acoustic signals to be verified.The acoustic signals amendment for being obtained parsing by anti-interference model, tool
Body can be superimposed anti-interference model with the acoustic signals that parsing obtains, and can also be to be filtered in the acoustic signals analytically obtained
Remove anti-interference model.
In the present embodiment, by acquiring current noise signal, anti-interference model is generated, can be repaiied according to anti-interference model
Positive acoustic signals improve the accuracy rate of voice print verification so that the acoustic signals that parsing obtains are more accurate.
In one embodiment, which includes:Obtain the time for acquiring the voice messaging to be verified
Information and/or geographical location information;The default scene type that inquiry matches with the temporal information and/or geographical location information;
Using the default scene type inquired as current scene type.
Wherein, temporal information is to acquire the time of voice messaging to be verified.Temporal information specifically include the date and in a few days when
Between point, when time of day point includes, minute and the second.Geographical location information is the geographical location where acquisition voice messaging to be verified.
Geographical location information specifically includes urban sign and building identifies, and building mark can be specifically sports ground, house, hospital, public affairs
At least one of department, subway station and road etc..
In one embodiment, terminal obtains the time of day point for acquiring voice messaging to be verified, e.g. 6 points of morning
It is whole, then obtain the geographical location information that terminal is currently located, e.g. Nanshan District of Shenzhen City Bay in Shenzhen park, according to the biography in terminal
Sensor gets terminal in 30 minutes before getting voice messaging to be verified all in movement, and 8 kms that remain a constant speed are every
Hour, then it is " jogging open air " to inquire default scene type, then " jogging open air " is used as current scene type by terminal.
In one embodiment, terminal gets the geographical location information being currently located, and e.g. at home, then directly selects
The default scene type taken is " in family ", and current scene type will be used as " in family ".
In one embodiment, terminal detects WIFI (the Wireless Fidelity, based on IEEE802.11b of connection
The WLAN of standard) be preset safe WIFI, then the default scene type directly chosen is " home ", and will
" home " is used as current scene type.
In the present embodiment, the temporal information and/or geographical location information of voice messaging to be verified, inquiry are acquired by obtaining
Matched default scene type can choose corresponding spy using the default scene type inquired as current scene type
Model is levied, so that the matched scene type of voice messaging to be verified is consistent with the matched scene type of characteristic model, to
Reduce image of the scene to voice messaging to be verified as far as possible, and then improves the return rate of speech verification.
In one embodiment, which includes:Obtain the time for acquiring the voice messaging to be verified
Information and geographical location information;Search the Weather information to match with the temporal information and the geographical location information;It inquires and is somebody's turn to do
The default scene type that Weather information matches;Using the default scene type inquired as current scene type.
Wherein, Weather information is the information of weather phenomenon in an area.Weather information specifically includes temperature, air pressure, wet
Degree, wind, cloud, mist, rain, sudden strain of a muscle, snow, frost, thunder, hail, haze etc..
In one embodiment, the date of terminal acquisition acquisition voice messaging to be verified and time of day point, e.g. 12
3 o'clock sharp of 18 afternoon moon, then the geographical location information that terminal is currently located is obtained, for example (,) Enterprises of Futian District safety mansion, root
Current Weather information, such as cloudy, current temperature are inquired in weather forecast system according to the date and geographical location information of acquisition
12 degrees Celsius, 5 grades of northeaster are spent, and comparison 3 o'clock sharp of afternoon December 17 cools down 5 degrees Celsius, then the default scene inquired
Type is " easy catching a cold "." easy catching a cold " that inquires is used as current scene type.
In the present embodiment, the temporal information and/or geographical location information of voice messaging to be verified, inquiry are acquired by obtaining
Matched Weather information, and inquire with the matched default scene type of Weather information, using the default scene type inquired as
Current scene type can choose corresponding characteristic model so that the matched scene type of voice messaging to be verified and
The matched scene type of characteristic model is consistent, to reduce image of the scene to voice messaging to be verified as far as possible, and then improves
The return rate of speech verification.
In one embodiment, this method further includes:Obtain public characteristic model;It obtains and default scene type and the use
Family identifies corresponding trained speech samples;The public characteristic model is subjected to retraining according to the training speech samples, is obtained
The characteristic model that scene type and the user identifier match is preset with this.
Wherein, public characteristic model is general characteristic model.Public characteristic model is specifically same type of sound
General characteristic model, such as male voice, child's voice or female voice etc..Training speech samples are the voices of training characteristics model acquisition
Information.Specifically, the period of training speech samples is acquired between choosing public characteristic model the latter moon to three months, specifically
Time depends on the frequency of acquisition training speech samples.
In one embodiment, server is chosen and the GMM-UBM (Gaussian of the voice print matching of user in model library
Markov Model-Uniform Background Model, gauss hybrid models-universal background model), within training period
By the training speech samples of acquisition, GMM-UBM is constantly trained, GMM-UBM is trained to the user identifier phase with user
The characteristic model matched.When server trains GMM-UBM, detect that the vocal print feature of trained speech samples is received with other time
The vocal print feature collected changes greatly, then obtains the scene informations such as geographical location information, temporal information and the Weather information of terminal,
The scene information got is identified as scene type.
In the present embodiment, retraining is carried out to public characteristic model by using training speech samples, can rapidly be instructed
Practise characteristic model so that efficiency is got higher.
As shown in figure 3, in one embodiment, additionally providing a kind of speech verification method, this method specifically includes following
The step of:
S302, terminal obtain authentication instruction.
S304, terminal are instructed in response to the authentication, obtain user identifier.
S306, terminal inquiry correspond to the text that the user identifier is pre-configured.
S308 generates text at random when terminal does not inquire the text.
S310, terminal feed back the text generated at random.
S312, terminal acquire current noise information.
The voice messaging to be verified that the text of S314, terminal acquisition and feedback matches.
The noise information of acquisition and voice messaging to be verified are fed back to server by S316, terminal.
S318, server generate anti-interference model according to noise information.
S320, server parse the voice messaging to be verified, obtain corresponding acoustic signals.
S322, server will parse obtained acoustic signals after parsing obtains acoustic signals, by the anti-interference model
After amendment.
The acoustic signals framing is obtained the acoustic signals of each frame by S324, server.
S326, server carry out Fourier transform to the acoustic signals of each frame, obtain corresponding frequency spectrum.
S328, server extract single frames vocal print feature from the frequency spectrum.
S330, server generate the vocal print feature of the voice messaging to be verified according to the single frames vocal print feature of each frame.
S332, server convert the vocal print feature to text to be verified.
S334, terminal obtain the temporal information and geographical location information for acquiring the voice messaging to be verified.
S336, after temporal information and geographical location information are fed back to server by terminal, whois lookup is believed with the time
The Weather information that breath and the geographical location information match.
S338, the default scene type that server inquiry matches with the Weather information.
S340, server is using the default scene type inquired as current scene type.
S342, server inquire and the current scene type matching and characteristic model corresponding with the user identifier.
The text conversion to be verified is with reference to vocal print feature by this feature model by S344, server.
S346, server compares the vocal print feature to be verified and this refers to vocal print feature, obtains speech verification result.
Above-mentioned speech verification method, after obtaining voice messaging to be verified and corresponding user identifier, from voice to be verified
Vocal print feature and text to be verified are extracted in information.By obtaining current scene type, inquiry and current scene type matching,
And characteristic model corresponding with user identifier, since voice messaging to be verified is obtained under the current corresponding scene of scene type
Get, therefore voice messaging to be verified and current scene type matching, vocal print feature to be verified also with current scene type
Matching.By characteristic model by text conversion to be verified be with reference to vocal print feature, this with reference to vocal print feature naturally also with work as front court
Scape type matching.With reference to vocal print feature and vocal print feature to be verified all with current scene type matching when, by comparing should
Vocal print feature to be verified refers to vocal print feature with this, and obtained speech verification result can accurately reflect voice to be verified
Information whether be user voice messaging, so as to when user voice changes, also can recognize that user's sheet
The sound of people.And when being verified, marked using vocal print feature pair to be verified and current scene type matching and with user
Know corresponding characteristic model retraining and update, the validity of characteristic model corresponding with this scene type can also be improved,
And then improve the recall rate of speech verification.
It should be understood that although each step in the flow chart of Fig. 3 is shown successively according to the instruction of arrow, this
A little steps are not that the inevitable sequence indicated according to arrow executes successively.Unless expressly state otherwise herein, these steps
It executes there is no the limitation of stringent sequence, these steps can execute in other order.Moreover, at least part in Fig. 3
Step may include that either these sub-steps of multiple stages or stage are executed in synchronization to multiple sub-steps
It completes, but can execute at different times, the execution sequence in these sub-steps or stage is also not necessarily to be carried out successively,
But it can either the sub-step of other steps or at least part in stage execute in turn or alternately with other steps.
In one embodiment, as shown in figure 4, providing a kind of speech verification device 400, including:Data obtaining module
402, information extraction modules 404, type acquisition module 406, pattern query module 408, Feature Conversion module 410, feature compare
Module 412, retraining module 413 and model modification module 415, wherein:Data obtaining module 402, for obtaining language to be verified
Message ceases and corresponding user identifier;Information extraction modules 404, for extracting sound to be verified from the voice messaging to be verified
Line feature and text to be verified;Type acquisition module 406, for obtaining current scene type;Pattern query module 408, is used for
Inquire and the current scene type matching and characteristic model corresponding with the user identifier;Feature Conversion module 410, for leading to
This feature model is crossed, is with reference to vocal print feature by the text conversion to be verified;Feature comparison module 412, it is to be tested for comparing this
The vocal print feature of card refers to vocal print feature with this, obtains speech verification result;Retraining module 413, for working as the speech verification
When as a result indicating to be verified, then retraining is carried out to this feature model according to the vocal print feature to be verified;Model modification mould
Block 415, for using after retraining characteristic model update and the current scene type matching and corresponding with the user identifier
Characteristic model.
Above-mentioned speech verification device 400, after obtaining voice messaging to be verified and corresponding user identifier, from language to be verified
Extraction vocal print feature and text to be verified in message breath.By obtaining current scene type, inquiry and current scene type
Match and characteristic model corresponding with user identifier, since voice messaging to be verified is in the current corresponding scene of scene type
Under get, therefore voice messaging to be verified and current scene type matching, vocal print feature to be verified also with current scene
Type matching.By characteristic model by text conversion to be verified be with reference to vocal print feature, this with reference to vocal print feature naturally also with work as
Preceding scene type matching.With reference to vocal print feature and vocal print feature to be verified all with current scene type matching when, pass through ratio
Vocal print feature is referred to compared with the vocal print feature to be verified and this, obtained speech verification result can accurately reflect to be verified
Voice messaging whether be user voice messaging, so as to when user voice changes, also can recognize that use
The sound at family.And when being verified, using vocal print feature pair to be verified and current scene type matching and with
Family identifies corresponding characteristic model retraining and updates, and can also improve the effective of characteristic model corresponding with this scene type
Property, and then improve the recall rate of speech verification.
As shown in figure 5, in one embodiment, data obtaining module 402, including:Instruction acquisition module 402a obtains body
Part verification instruction;Identifier acquisition module 402b is instructed in response to the authentication, obtains user identifier;Text query module
402c, for inquiring the text being pre-configured corresponding to the user identifier;Text generation module 402d, for this article ought not inquired
This when, generates text at random;Text feedback module 402e, for feeding back the text generated at random;Information acquisition module 402f,
The voice messaging to be verified that the text for acquiring and feeding back matches.
As shown in fig. 6, in one embodiment, information extraction modules 404, including:Information analysis module 404a, for solving
The voice messaging to be verified is analysed, corresponding acoustic signals are obtained;Signal framing module 404b is used for the acoustic signals framing,
Obtain the acoustic signals of each frame;Signal conversion module 404c carries out Fourier transform for the acoustic signals to each frame,
Obtain corresponding frequency spectrum;Characteristic extracting module 404d, for extracting single frames vocal print feature from the frequency spectrum;Feature generation module
404e, the vocal print feature for generating the voice messaging to be verified according to the single frames vocal print feature of each frame;Text conversion module
404f, for converting the vocal print feature to text to be verified.
In one embodiment, data obtaining module 402 are additionally operable to acquire current noise information;Information extraction modules
404, it is additionally operable to generate anti-interference model according to the noise information of acquisition;It is anti-interference by this after parsing obtains acoustic signals
After the acoustic signals that model obtains parsing are corrected, this is executed by the acoustic signals framing, obtains the acoustic signals of each frame
Step.
As shown in fig. 7, in one embodiment, type acquisition module 406, including:Scene acquisition module 406a, for obtaining
Take the temporal information and/or geographical location information for acquiring the voice messaging to be verified;Type queries module 406b, for inquire with
The default scene type that the temporal information and/or geographical location information match;Determination type module 406c, for that will inquire
Default scene type as current scene type.
In one embodiment, scene acquisition module 406a is additionally operable to obtain the time for acquiring the voice messaging to be verified
Information and geographical location information;The above-mentioned type acquisition module 406 further includes:Weather acquisition module 406d, when for searching with this
Between the Weather information that matches of information and the geographical location information;Type queries module 406b is additionally operable to inquiry and believes with the weather
The matched default scene type of manner of breathing;Determination type module 406c is additionally operable to the default scene type that will be inquired as current
Scene type.
As shown in figure 8, in one embodiment, above-mentioned speech verification device 400 further includes:Model acquisition module 414,
For obtaining public characteristic model;Sample acquisition module 416, it is corresponding with default scene type and the user identifier for obtaining
Training speech samples;Model training module 418, for being instructed the public characteristic model again according to the training speech samples
Practice, obtains presetting the characteristic model that scene type and the user identifier match with this.
Specific about speech verification device limits the restriction that may refer to above for speech verification method, herein not
It repeats again.Modules in above-mentioned speech verification device can be realized fully or partially through software, hardware and combinations thereof.On
Stating each module can be embedded in or independently of in the processor in computer equipment, can also store in a software form in the form of hardware
In memory in computer equipment, the corresponding operation of the above modules is executed in order to which processor calls.
In one embodiment, a kind of computer equipment is provided, which can be terminal, internal structure
Figure can be as shown in Figure 9.The computer equipment includes the processor connected by system bus, memory, network interface, display
Screen and input unit.Wherein, the processor of the computer equipment is for providing calculating and control ability.The computer equipment is deposited
Reservoir includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system and computer journey
Sequence.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating
The network interface of machine equipment is used to communicate by network connection with external terminal.When the computer program is executed by processor with
Realize a kind of speech verification method.The display screen of the computer equipment can be liquid crystal display or electric ink display screen,
The input unit of the computer equipment can be the touch layer covered on display screen, can also be to be arranged on computer equipment shell
Button, trace ball or Trackpad, can also be external keyboard, Trackpad or mouse etc..
It will be understood by those skilled in the art that structure shown in Fig. 9, is only tied with the relevant part of application scheme
The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment
May include either combining certain components than more or fewer components as shown in the figure or being arranged with different components.
In one embodiment, a kind of computer equipment, including memory and processor are provided, which is stored with
Computer program, processor realize following steps when executing computer program:Obtain voice messaging to be verified and corresponding user
Mark;Vocal print feature to be verified and text to be verified are extracted from the voice messaging to be verified;Obtain current scene type;It looks into
Ask and the current scene type matching and characteristic model corresponding with the user identifier;It is by this feature model, this is to be verified
Text conversion is with reference to vocal print feature;Compare the vocal print feature to be verified and this refers to vocal print feature, obtains speech verification knot
Fruit;When speech verification result expression is verified, then this feature model is carried out again according to the vocal print feature to be verified
Training;And the current scene type matching and feature corresponding with the user identifier is updated using the characteristic model after retraining
Model.
Above computer equipment is believed after obtaining voice messaging to be verified and corresponding user identifier from voice to be verified
Vocal print feature and text to be verified are extracted in breath.By obtaining current scene type, inquiry and current scene type matching and
Characteristic model corresponding with user identifier, since voice messaging to be verified is obtained under the current corresponding scene of scene type
Arrive, therefore voice messaging to be verified and current scene type matching, vocal print feature to be verified also with current scene type
Match.By characteristic model by text conversion to be verified be with reference to vocal print feature, this with reference to vocal print feature naturally also with current scene
Type matching.With reference to vocal print feature and vocal print feature to be verified all with current scene type matching when, waited for by comparing this
The vocal print feature of verification refers to vocal print feature with this, and obtained speech verification result can accurately reflect voice letter to be verified
Breath whether be user voice messaging, so as to when user voice changes, also can recognize that user
Sound.And when being verified, vocal print feature pair to be verified and current scene type matching and and user identifier are used
Corresponding characteristic model retraining simultaneously updates, and can also improve the validity of characteristic model corresponding with this scene type, into
And improve the recall rate of speech verification.
In one embodiment, following steps are also realized when processor executes computer program:Obtain authentication instruction;
It is instructed in response to the authentication, obtains user identifier;Inquiry corresponds to the text that the user identifier is pre-configured;When not inquiring
When the text, text is generated at random;Feed back the text generated at random;The language to be verified that the text of acquisition and feedback matches
Message ceases.
In one embodiment, following steps are also realized when processor executes computer program:Parse the voice to be verified
Information obtains corresponding acoustic signals;By the acoustic signals framing, the acoustic signals of each frame are obtained;To the sound of each frame
Wave signal carries out Fourier transform, obtains corresponding frequency spectrum;Single frames vocal print feature is extracted from the frequency spectrum;According to the list of each frame
Frame vocal print feature generates the vocal print feature of the voice messaging to be verified;Convert the vocal print feature to text to be verified.
In one embodiment, following steps are also realized when processor executes computer program:Acquire current noise letter
Breath;Anti-interference model is generated according to the noise information of acquisition;After parsing obtains acoustic signals, it will be solved by the anti-interference model
The step of analysing after obtained acoustic signals correct, executing this by the acoustic signals framing, obtain the acoustic signals of each frame.
In one embodiment, following steps are also realized when processor executes computer program:Obtaining acquisition, this is to be verified
The temporal information and/or geographical location information of voice messaging;Inquiry matches with the temporal information and/or geographical location information
Default scene type;Using the default scene type inquired as current scene type.
In one embodiment, following steps are also realized when processor executes computer program:Obtaining acquisition, this is to be verified
The temporal information and geographical location information of voice messaging;Search the weather to match with the temporal information and the geographical location information
Information;The default scene type that inquiry matches with the Weather information;Using the default scene type inquired as current scene
Type.
In one embodiment, following steps are also realized when processor executes computer program:Obtain public characteristic model;
Obtain trained speech samples corresponding with default scene type and the user identifier;It is according to the training speech samples that this is public
Characteristic model carries out retraining, obtains presetting the characteristic model that scene type and the user identifier match with this.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated
Machine program realizes following steps when being executed by processor:Obtain voice messaging to be verified and corresponding user identifier;It is to be tested from this
Vocal print feature to be verified and text to be verified are extracted in card voice messaging;Obtain current scene type;Inquire and deserve front court
Scape type matching and characteristic model corresponding with the user identifier;It is ginseng by the text conversion to be verified by this feature model
Examine vocal print feature;Compare the vocal print feature to be verified and this refers to vocal print feature, obtains speech verification result;When the voice is tested
When card result expression is verified, then retraining is carried out to this feature model according to the vocal print feature to be verified;Using instructing again
Characteristic model after white silk updates and the current scene type matching and characteristic model corresponding with the user identifier.
Above computer readable storage medium storing program for executing, after obtaining voice messaging to be verified and corresponding user identifier, to be tested
Demonstrate,prove extraction vocal print feature and text to be verified in voice messaging.By obtaining current scene type, inquiry and current scene class
Type matches and characteristic model corresponding with user identifier, since voice messaging to be verified is corresponding in current scene type
Got under scene, therefore voice messaging to be verified and current scene type matching, vocal print feature to be verified also with currently
Scene type matches.By characteristic model by text conversion to be verified for reference to vocal print feature, this refers to vocal print feature naturally also
With current scene type matching.With reference to vocal print feature and vocal print feature to be verified all with current scene type matching when, lead to
Cross compare the vocal print feature to be verified and this refer to vocal print feature, obtained speech verification result, which can accurately reflect, to be waited for
Verification voice messaging whether be user voice messaging, so as to when user voice changes, can also identify
Go out the sound of user.And when being verified, using vocal print feature pair to be verified and current scene type matching and
Characteristic model retraining corresponding with user identifier simultaneously updates, and can also improve characteristic model corresponding with this scene type
Validity, and then improve the recall rate of speech verification.
In one embodiment, following steps are also realized when processor executes computer program:Obtain authentication instruction;
It is instructed in response to the authentication, obtains user identifier;Inquiry corresponds to the text that the user identifier is pre-configured;When not inquiring
When the text, text is generated at random;Feed back the text generated at random;The language to be verified that the text of acquisition and feedback matches
Message ceases.
In one embodiment, following steps are also realized when processor executes computer program:Parse the voice to be verified
Information obtains corresponding acoustic signals;By the acoustic signals framing, the acoustic signals of each frame are obtained;To the sound of each frame
Wave signal carries out Fourier transform, obtains corresponding frequency spectrum;Single frames vocal print feature is extracted from the frequency spectrum;According to the list of each frame
Frame vocal print feature generates the vocal print feature of the voice messaging to be verified;Convert the vocal print feature to text to be verified.
In one embodiment, following steps are also realized when processor executes computer program:Acquire current noise letter
Breath;Anti-interference model is generated according to the noise information of acquisition;After parsing obtains acoustic signals, it will be solved by the anti-interference model
The step of analysing after obtained acoustic signals correct, executing this by the acoustic signals framing, obtain the acoustic signals of each frame.
In one embodiment, following steps are also realized when processor executes computer program:Obtaining acquisition, this is to be verified
The temporal information and/or geographical location information of voice messaging;Inquiry matches with the temporal information and/or geographical location information
Default scene type;Using the default scene type inquired as current scene type.
In one embodiment, following steps are also realized when processor executes computer program:Obtaining acquisition, this is to be verified
The temporal information and geographical location information of voice messaging;Search the weather to match with the temporal information and the geographical location information
Information;The default scene type that inquiry matches with the Weather information;Using the default scene type inquired as current scene
Type.
In one embodiment, following steps are also realized when processor executes computer program:Obtain public characteristic model;
Obtain trained speech samples corresponding with default scene type and the user identifier;It is according to the training speech samples that this is public
Characteristic model carries out retraining, obtains presetting the characteristic model that scene type and the user identifier match with this.
One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer
In read/write memory medium, the computer program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein,
Any reference to memory, storage, database or other media used in each embodiment provided herein,
Including nonvolatile memory.Nonvolatile memory may include that read-only memory (ROM), programming ROM (PROM), electricity can be compiled
Journey ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.
Each technical characteristic of above example can be combined arbitrarily, to keep description succinct, not to above-described embodiment
In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance
Shield is all considered to be the range of this specification record.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously
It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art
It says, under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the protection of the application
Range.Therefore, the protection domain of the application patent should be determined by the appended claims.
Claims (10)
1. a kind of speech verification method, the method includes:
Obtain voice messaging to be verified and corresponding user identifier;
Vocal print feature to be verified and text to be verified are extracted from the voice messaging to be verified;
Obtain current scene type;
Inquire and the current scene type matching and characteristic model corresponding with the user identifier;
It is with reference to vocal print feature by the text conversion to be verified by the characteristic model;
Compare the vocal print feature to be verified and described with reference to vocal print feature, obtains speech verification result;
When speech verification result expression is verified, then according to the vocal print feature to be verified to the characteristic model
Carry out retraining;
And the current scene type matching and spy corresponding with the user identifier is updated using the characteristic model after retraining
Levy model.
2. according to the method described in claim 1, it is characterized in that, acquisition voice messaging to be verified and corresponding user mark
Know, including:
Obtain authentication instruction;
It is instructed in response to the authentication, obtains user identifier;
Inquiry corresponds to the text that the user identifier is pre-configured;
When not inquiring the text, text is generated at random;
Feed back the text generated at random;
The voice messaging to be verified that the text of acquisition and feedback matches.
3. according to the method described in claim 1, it is characterized in that, it is described extracted from the voice messaging to be verified it is to be verified
Vocal print feature and text to be verified, including:
The voice messaging to be verified is parsed, corresponding acoustic signals are obtained;
By the acoustic signals framing, the acoustic signals of each frame are obtained;
Fourier transform is carried out to the acoustic signals of each frame, obtains corresponding frequency spectrum;
Single frames vocal print feature is extracted from the frequency spectrum;
The vocal print feature of the voice messaging to be verified is generated according to the single frames vocal print feature of each frame;
Convert the vocal print feature to text to be verified.
4. according to the method described in claim 3, it is characterized in that, the method further includes:
Acquire current noise information;
Anti-interference model is generated according to the noise information of acquisition;
After parsing obtains acoustic signals, after being corrected by the acoustic signals that the anti-interference model obtains parsing, institute is executed
The step of stating the acoustic signals framing, obtaining the acoustic signals of each frame.
5. according to the method described in claim 1, it is characterized in that, the acquisition current scene type includes:
Obtain the temporal information and/or geographical location information for acquiring the voice messaging to be verified;
The default scene type that inquiry matches with the temporal information and/or geographical location information;
Using the default scene type inquired as current scene type.
6. according to the method described in claim 1, it is characterized in that, the acquisition current scene type includes:
Obtain the temporal information and geographical location information for acquiring the voice messaging to be verified;
Search the Weather information to match with the temporal information and the geographical location information;
The default scene type that inquiry matches with the Weather information;
Using the default scene type inquired as current scene type.
7. method according to any one of claim 1 to 6, which is characterized in that further include:
Obtain public characteristic model;
Obtain trained speech samples corresponding with default scene type and the user identifier;
According to the trained speech samples by the public characteristic model carry out retraining, obtain with the default scene type and
The characteristic model that the user identifier matches.
8. a kind of speech verification device, which is characterized in that described device includes:
Data obtaining module, for obtaining voice messaging to be verified and corresponding user identifier;
Information extraction modules, for extracting vocal print feature to be verified and text to be verified from the voice messaging to be verified;
Type acquisition module, for obtaining current scene type;
Pattern query module, for inquiring and the current scene type matching and character modules corresponding with the user identifier
Type;
Feature Conversion module, for being with reference to vocal print feature by the text conversion to be verified by the characteristic model;
Feature comparison module is used for the vocal print feature to be verified and described with reference to vocal print feature, obtains speech verification
As a result
Retraining module, it is for when speech verification result expression is verified, then special according to the vocal print to be verified
Sign carries out retraining to the characteristic model;
Model modification module, for using the characteristic model update after retraining with the current scene type matching and with institute
State the corresponding characteristic model of user identifier.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists
In when the processor executes the computer program the step of any one of realization claim 1 to 7 the method.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program
The step of method described in any one of claim 1 to 7 is realized when being executed by processor.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810041764.3A CN108305633B (en) | 2018-01-16 | 2018-01-16 | Speech verification method, apparatus, computer equipment and computer readable storage medium |
PCT/CN2018/088696 WO2019140823A1 (en) | 2018-01-16 | 2018-05-28 | Voice verification method, apparatus, computer device and computer-readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810041764.3A CN108305633B (en) | 2018-01-16 | 2018-01-16 | Speech verification method, apparatus, computer equipment and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108305633A true CN108305633A (en) | 2018-07-20 |
CN108305633B CN108305633B (en) | 2019-03-29 |
Family
ID=62869165
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810041764.3A Active CN108305633B (en) | 2018-01-16 | 2018-01-16 | Speech verification method, apparatus, computer equipment and computer readable storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108305633B (en) |
WO (1) | WO2019140823A1 (en) |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108989349A (en) * | 2018-08-31 | 2018-12-11 | 平安科技(深圳)有限公司 | User account number unlocking method, device, computer equipment and storage medium |
CN109147797A (en) * | 2018-10-18 | 2019-01-04 | 平安科技(深圳)有限公司 | Client service method, device, computer equipment and storage medium based on Application on Voiceprint Recognition |
CN109273009A (en) * | 2018-08-02 | 2019-01-25 | 平安科技(深圳)有限公司 | Access control method, device, computer equipment and storage medium |
CN109410938A (en) * | 2018-11-28 | 2019-03-01 | 途客电力科技(天津)有限公司 | Control method for vehicle, device and car-mounted terminal |
CN109410956A (en) * | 2018-12-24 | 2019-03-01 | 科大讯飞股份有限公司 | A kind of object identifying method of audio data, device, equipment and storage medium |
CN109450850A (en) * | 2018-09-26 | 2019-03-08 | 深圳壹账通智能科技有限公司 | Auth method, device, computer equipment and storage medium |
CN109446774A (en) * | 2018-09-30 | 2019-03-08 | 山东知味行网络科技有限公司 | A kind of identification application method and system |
CN110827799A (en) * | 2019-11-21 | 2020-02-21 | 百度在线网络技术(北京)有限公司 | Method, apparatus, device and medium for processing voice signal |
CN111292739A (en) * | 2018-12-10 | 2020-06-16 | 珠海格力电器股份有限公司 | Voice control method and device, storage medium and air conditioner |
WO2020119541A1 (en) * | 2018-12-11 | 2020-06-18 | 阿里巴巴集团控股有限公司 | Voice data identification method, apparatus and system |
CN111415669A (en) * | 2020-04-15 | 2020-07-14 | 厦门快商通科技股份有限公司 | Voiceprint model construction method, device and equipment |
CN111445904A (en) * | 2018-12-27 | 2020-07-24 | 北京奇虎科技有限公司 | Cloud-based voice control method and device and electronic equipment |
CN111653283A (en) * | 2020-06-28 | 2020-09-11 | 讯飞智元信息科技有限公司 | Cross-scene voiceprint comparison method, device, equipment and storage medium |
CN111795707A (en) * | 2020-07-21 | 2020-10-20 | 高超群 | New energy automobile charging pile route planning method |
CN111916053A (en) * | 2020-08-17 | 2020-11-10 | 北京字节跳动网络技术有限公司 | Voice generation method, device, equipment and computer readable medium |
CN112289325A (en) * | 2019-07-24 | 2021-01-29 | 华为技术有限公司 | Voiceprint recognition method and device |
CN112447167A (en) * | 2020-11-17 | 2021-03-05 | 康键信息技术(深圳)有限公司 | Voice recognition model verification method and device, computer equipment and storage medium |
CN112599137A (en) * | 2020-12-16 | 2021-04-02 | 康键信息技术(深圳)有限公司 | Method and device for verifying voiceprint model recognition effect and computer equipment |
CN112669820A (en) * | 2020-12-16 | 2021-04-16 | 平安科技(深圳)有限公司 | Examination cheating recognition method and device based on voice recognition and computer equipment |
CN112992153A (en) * | 2021-04-27 | 2021-06-18 | 太平金融科技服务(上海)有限公司 | Audio processing method, voiceprint recognition device and computer equipment |
CN112992174A (en) * | 2021-02-03 | 2021-06-18 | 深圳壹秘科技有限公司 | Voice analysis method and voice recording device thereof |
CN113066501A (en) * | 2021-03-15 | 2021-07-02 | Oppo广东移动通信有限公司 | Method and device for starting terminal by voice, medium and electronic equipment |
CN113254897A (en) * | 2021-05-13 | 2021-08-13 | 北京达佳互联信息技术有限公司 | Information verification method, device, server and storage medium |
WO2023004561A1 (en) * | 2021-07-27 | 2023-02-02 | Qualcomm Incorporated | Voice or speech recognition using contextual information and user emotion |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1280137A1 (en) * | 2001-07-24 | 2003-01-29 | Sony International (Europe) GmbH | Method for speaker identification |
CN102708867A (en) * | 2012-05-30 | 2012-10-03 | 北京正鹰科技有限责任公司 | Method and system for identifying faked identity by preventing faked recordings based on voiceprint and voice |
CN105635087A (en) * | 2014-11-20 | 2016-06-01 | 阿里巴巴集团控股有限公司 | Method and apparatus for verifying user identity through voiceprint |
CN106356057A (en) * | 2016-08-24 | 2017-01-25 | 安徽咪鼠科技有限公司 | Speech recognition system based on semantic understanding of computer application scenario |
CN107481720A (en) * | 2017-06-30 | 2017-12-15 | 百度在线网络技术(北京)有限公司 | A kind of explicit method for recognizing sound-groove and device |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1905445B (en) * | 2005-07-27 | 2012-02-15 | 国际商业机器公司 | System and method of speech identification using mobile speech identification card |
CN104104664A (en) * | 2013-04-11 | 2014-10-15 | 腾讯科技(深圳)有限公司 | Method, server, client and system for verifying verification code |
CN104331652A (en) * | 2014-10-08 | 2015-02-04 | 无锡指网生物识别科技有限公司 | Dynamic cipher generation method for electronic equipment for fingerprint and voice recognition |
US9940934B2 (en) * | 2015-11-18 | 2018-04-10 | Uniphone Software Systems | Adaptive voice authentication system and method |
CN106782569A (en) * | 2016-12-06 | 2017-05-31 | 深圳增强现实技术有限公司 | A kind of augmented reality method and device based on voiceprint registration |
CN107424613A (en) * | 2017-05-16 | 2017-12-01 | 鄂尔多斯市普渡科技有限公司 | The Phonetically door-opening Verification System and its method of a kind of unmanned taxi |
CN107516526B (en) * | 2017-08-25 | 2022-09-06 | 百度在线网络技术(北京)有限公司 | Sound source tracking and positioning method, device, equipment and computer readable storage medium |
-
2018
- 2018-01-16 CN CN201810041764.3A patent/CN108305633B/en active Active
- 2018-05-28 WO PCT/CN2018/088696 patent/WO2019140823A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1280137A1 (en) * | 2001-07-24 | 2003-01-29 | Sony International (Europe) GmbH | Method for speaker identification |
CN102708867A (en) * | 2012-05-30 | 2012-10-03 | 北京正鹰科技有限责任公司 | Method and system for identifying faked identity by preventing faked recordings based on voiceprint and voice |
CN105635087A (en) * | 2014-11-20 | 2016-06-01 | 阿里巴巴集团控股有限公司 | Method and apparatus for verifying user identity through voiceprint |
CN106356057A (en) * | 2016-08-24 | 2017-01-25 | 安徽咪鼠科技有限公司 | Speech recognition system based on semantic understanding of computer application scenario |
CN107481720A (en) * | 2017-06-30 | 2017-12-15 | 百度在线网络技术(北京)有限公司 | A kind of explicit method for recognizing sound-groove and device |
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109273009A (en) * | 2018-08-02 | 2019-01-25 | 平安科技(深圳)有限公司 | Access control method, device, computer equipment and storage medium |
CN108989349A (en) * | 2018-08-31 | 2018-12-11 | 平安科技(深圳)有限公司 | User account number unlocking method, device, computer equipment and storage medium |
CN109450850A (en) * | 2018-09-26 | 2019-03-08 | 深圳壹账通智能科技有限公司 | Auth method, device, computer equipment and storage medium |
CN109446774B (en) * | 2018-09-30 | 2021-11-30 | 山东知味行网络科技有限公司 | Identity recognition application method and system |
CN109446774A (en) * | 2018-09-30 | 2019-03-08 | 山东知味行网络科技有限公司 | A kind of identification application method and system |
WO2020077841A1 (en) * | 2018-10-18 | 2020-04-23 | 平安科技(深圳)有限公司 | Voiceprint recognition-based customer service method, device, computer device, and storage medium |
CN109147797A (en) * | 2018-10-18 | 2019-01-04 | 平安科技(深圳)有限公司 | Client service method, device, computer equipment and storage medium based on Application on Voiceprint Recognition |
CN109410938A (en) * | 2018-11-28 | 2019-03-01 | 途客电力科技(天津)有限公司 | Control method for vehicle, device and car-mounted terminal |
CN111292739B (en) * | 2018-12-10 | 2023-03-31 | 珠海格力电器股份有限公司 | Voice control method and device, storage medium and air conditioner |
CN111292739A (en) * | 2018-12-10 | 2020-06-16 | 珠海格力电器股份有限公司 | Voice control method and device, storage medium and air conditioner |
WO2020119541A1 (en) * | 2018-12-11 | 2020-06-18 | 阿里巴巴集团控股有限公司 | Voice data identification method, apparatus and system |
CN109410956A (en) * | 2018-12-24 | 2019-03-01 | 科大讯飞股份有限公司 | A kind of object identifying method of audio data, device, equipment and storage medium |
CN109410956B (en) * | 2018-12-24 | 2021-10-08 | 科大讯飞股份有限公司 | Object identification method, device, equipment and storage medium of audio data |
CN111445904A (en) * | 2018-12-27 | 2020-07-24 | 北京奇虎科技有限公司 | Cloud-based voice control method and device and electronic equipment |
CN112289325A (en) * | 2019-07-24 | 2021-01-29 | 华为技术有限公司 | Voiceprint recognition method and device |
CN110827799A (en) * | 2019-11-21 | 2020-02-21 | 百度在线网络技术(北京)有限公司 | Method, apparatus, device and medium for processing voice signal |
CN110827799B (en) * | 2019-11-21 | 2022-06-10 | 百度在线网络技术(北京)有限公司 | Method, apparatus, device and medium for processing voice signal |
CN111415669A (en) * | 2020-04-15 | 2020-07-14 | 厦门快商通科技股份有限公司 | Voiceprint model construction method, device and equipment |
CN111653283A (en) * | 2020-06-28 | 2020-09-11 | 讯飞智元信息科技有限公司 | Cross-scene voiceprint comparison method, device, equipment and storage medium |
CN111653283B (en) * | 2020-06-28 | 2024-03-01 | 讯飞智元信息科技有限公司 | Cross-scene voiceprint comparison method, device, equipment and storage medium |
CN111795707A (en) * | 2020-07-21 | 2020-10-20 | 高超群 | New energy automobile charging pile route planning method |
CN111916053A (en) * | 2020-08-17 | 2020-11-10 | 北京字节跳动网络技术有限公司 | Voice generation method, device, equipment and computer readable medium |
CN112447167A (en) * | 2020-11-17 | 2021-03-05 | 康键信息技术(深圳)有限公司 | Voice recognition model verification method and device, computer equipment and storage medium |
CN112669820A (en) * | 2020-12-16 | 2021-04-16 | 平安科技(深圳)有限公司 | Examination cheating recognition method and device based on voice recognition and computer equipment |
CN112669820B (en) * | 2020-12-16 | 2023-08-04 | 平安科技(深圳)有限公司 | Examination cheating recognition method and device based on voice recognition and computer equipment |
CN112599137A (en) * | 2020-12-16 | 2021-04-02 | 康键信息技术(深圳)有限公司 | Method and device for verifying voiceprint model recognition effect and computer equipment |
CN112992174A (en) * | 2021-02-03 | 2021-06-18 | 深圳壹秘科技有限公司 | Voice analysis method and voice recording device thereof |
CN113066501A (en) * | 2021-03-15 | 2021-07-02 | Oppo广东移动通信有限公司 | Method and device for starting terminal by voice, medium and electronic equipment |
CN112992153B (en) * | 2021-04-27 | 2021-08-17 | 太平金融科技服务(上海)有限公司 | Audio processing method, voiceprint recognition device and computer equipment |
CN112992153A (en) * | 2021-04-27 | 2021-06-18 | 太平金融科技服务(上海)有限公司 | Audio processing method, voiceprint recognition device and computer equipment |
CN113254897A (en) * | 2021-05-13 | 2021-08-13 | 北京达佳互联信息技术有限公司 | Information verification method, device, server and storage medium |
CN113254897B (en) * | 2021-05-13 | 2024-01-05 | 北京达佳互联信息技术有限公司 | Information verification method, device, server and storage medium |
WO2023004561A1 (en) * | 2021-07-27 | 2023-02-02 | Qualcomm Incorporated | Voice or speech recognition using contextual information and user emotion |
Also Published As
Publication number | Publication date |
---|---|
WO2019140823A1 (en) | 2019-07-25 |
CN108305633B (en) | 2019-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108305633B (en) | Speech verification method, apparatus, computer equipment and computer readable storage medium | |
CN110797016B (en) | Voice recognition method and device, electronic equipment and storage medium | |
CN110310623B (en) | Sample generation method, model training method, device, medium, and electronic apparatus | |
CN110534099B (en) | Voice wake-up processing method and device, storage medium and electronic equipment | |
CN103021409B (en) | A kind of vice activation camera system | |
CN109473123A (en) | Voice activity detection method and device | |
CN110265040A (en) | Training method, device, storage medium and the electronic equipment of sound-groove model | |
CN110570873B (en) | Voiceprint wake-up method and device, computer equipment and storage medium | |
EP3255631A1 (en) | Dynamic password voice based identity authentication system and method having self-learning function | |
CN107943896A (en) | Information processing method and device | |
CN110556126B (en) | Speech recognition method and device and computer equipment | |
CN109326305B (en) | Method and system for batch testing of speech recognition and text synthesis | |
CN110972112B (en) | Subway running direction determining method, device, terminal and storage medium | |
CN110428854A (en) | Sound end detecting method, device and the computer equipment of vehicle-mounted end | |
CN105989842A (en) | Method and device for voiceprint similarity comparison and application thereof in digital entertainment on-demand system | |
CN109872713A (en) | A kind of voice awakening method and device | |
CN111292752A (en) | User intention identification method and device, electronic equipment and storage medium | |
CN109947971A (en) | Image search method, device, electronic equipment and storage medium | |
CN108806686B (en) | Starting control method of voice question searching application and family education equipment | |
CN113255556A (en) | Multi-mode voice endpoint detection method and device, vehicle-mounted terminal and storage medium | |
CN111048068B (en) | Voice wake-up method, device and system and electronic equipment | |
CN110147728A (en) | Customer information analysis method, system, equipment and readable storage medium storing program for executing | |
CN115171660A (en) | Voiceprint information processing method and device, electronic equipment and storage medium | |
CN113808577A (en) | Intelligent extraction method and device of voice abstract, electronic equipment and storage medium | |
CN113053352A (en) | Voice synthesis method, device, equipment and storage medium based on big data platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |