CN107886957A - The voice awakening method and device of a kind of combination Application on Voiceprint Recognition - Google Patents

The voice awakening method and device of a kind of combination Application on Voiceprint Recognition Download PDF

Info

Publication number
CN107886957A
CN107886957A CN201711145883.5A CN201711145883A CN107886957A CN 107886957 A CN107886957 A CN 107886957A CN 201711145883 A CN201711145883 A CN 201711145883A CN 107886957 A CN107886957 A CN 107886957A
Authority
CN
China
Prior art keywords
voice
verified
preset
vector
authority credentials
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711145883.5A
Other languages
Chinese (zh)
Inventor
陈东鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Speakin Network Technology Co Ltd
Original Assignee
Guangzhou Speakin Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Speakin Network Technology Co Ltd filed Critical Guangzhou Speakin Network Technology Co Ltd
Priority to CN201711145883.5A priority Critical patent/CN107886957A/en
Publication of CN107886957A publication Critical patent/CN107886957A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces

Abstract

The embodiment of the invention discloses a kind of voice awakening method of combination Application on Voiceprint Recognition and device.The MFCC features that voice to be verified is first passed through in the present invention judge whether the content in voice to be verified is preset wake-up word, if, i vector vectors are then extracted by preset deep neural network model, Application on Voiceprint Recognition is carried out by i vector vectors and confirms speaker's identity, obtain the authority credentials of voice to be verified, and the comparative result of authority credentials judges whether speaker has enough authorities corresponding to the preset wake-up word corresponding with voice to be verified of the authority credentials according to speaker, if, then perform corresponding to preset wake-up word corresponding with voice to be verified and operate, the voice arousal function for solving Current electronic product lacks subscription authentication function, subscription authentication can not be carried out to realize the technical problem of equipment operation that is more complicated and needing user right.

Description

The voice awakening method and device of a kind of combination Application on Voiceprint Recognition
Technical field
The present invention relates to the voice awakening method and device of vocal print application field, more particularly to a kind of combination Application on Voiceprint Recognition.
Background technology
Voice, which wakes up, refers to user by saying a default wake-up word to realize that electronic equipment is extensive from holding state Normal operating conditions is arrived again.Waken up by voice, user can be called out in the case where being inconvenient to click on electronic curtain by voice Function of waking up realizes the operation to electronic equipment.
But the current electronic product for possessing voice arousal function lacks the function to speaker's identity identification, can not sentence Disconnected speaker's identity, therefore authority can not be further opened, it can only realize that some are simple, the equipment without user right is grasped Make.
Therefore, the voice arousal function that result in Current electronic product lacks subscription authentication function, can not carry out user's mirror Weigh to realize the technical problem of equipment operation that is more complicated and needing user right.
The content of the invention
The invention provides a kind of voice awakening method of combination Application on Voiceprint Recognition and device, solves Current electronic product Voice arousal function lacks subscription authentication function, can not carry out subscription authentication to realize more complicated and need setting for user right The technical problem of standby operation.
The invention provides a kind of voice awakening method of combination Application on Voiceprint Recognition, including:
S1:Receive voice to be verified and carry out feature extraction, obtain the MFCC features of voice to be verified;
S2:The MFCC features of voice to be verified in predetermined period are cached;
S3:Whether the content that voice to be verified is judged according to the MFCC features of the voice to be verified of caching is preset wake-up Word, if so, then performing step S4;
S4:The MFCC features of the voice to be verified of caching are inputted in preset deep neural network model, obtained to be tested Demonstrate,prove the i-vector vectors of voice;
S5:The preset i-vector vectors of the i-vector vector sums of voice to be verified are compared, according to comparing Whether the matching fraction gone out obtains the authority credentials of voice to be verified, judge the authority credentials of voice to be verified more than or equal to be verified Authority credentials corresponding to preset wake-up word corresponding to voice, if so, then performing preset wake-up word corresponding with voice to be verified Corresponding operation.
Preferably, step S4 is specifically included:
S41:The MFCC features of the voice to be verified of caching are cascaded;
S42:MFCC features after cascade are inputted in preset deep neural network model, obtain the i- of voice to be verified Vector vectors, and preset deep neural network model is updated to by the MFCC features after cascade by new preset depth Neural network model.
Preferably, step S5 is specifically included:
S51:The preset i-vector vectors of the i-vector vector sums of voice to be verified are subjected to positive naturalization processing, will just The i-vector vectors that the i-vector vector sums of voice to be verified after naturalization processing are preset pass through outline linear distinction point Analysis model is compared, and gets the matching fraction for comparing and drawing;
S52:Matching fraction is added into migration fraction, obtains new matching fraction;
S53:The authority credentials of voice to be verified is obtained according to new matching fraction, judge voice to be verified authority credentials whether More than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then performing step S54;
S54:Perform and operated corresponding to preset wake-up word corresponding with voice to be verified.
Preferably, step S5 also includes:Step S55;
Step S53 is specifically included:The authority credentials of voice to be verified is obtained according to new matching fraction, judges voice to be verified Authority credentials whether be more than or equal to authority credentials corresponding to preset wake-up word corresponding to voice to be verified, if so, then performing step Rapid S54, if it is not, then performing step S55;
S55:The prompting of sending permission deficiency.
The invention provides a kind of voice Rouser of combination Application on Voiceprint Recognition, including:
Feature unit, for receiving voice to be verified and carrying out feature extraction, obtain the MFCC features of voice to be verified;
Buffer unit, for being cached to the MFCC features of the voice to be verified in predetermined period;
Wakeup unit, the MFCC features for the voice to be verified according to caching judge voice to be verified content whether be Preset wake-up word, if so, then performing step S4;
Vector location, for the MFCC features of the voice to be verified of caching to be inputted to preset deep neural network model In, the i-vector for obtaining voice to be verified is vectorial;
Comparing unit, the i-vector vector preset for the i-vector vector sums by voice to be verified are compared, The matching fraction drawn according to comparing obtains the authority credentials of voice to be verified, judge voice to be verified authority credentials whether be more than or Equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then performing corresponding with voice to be verified pre- Operated corresponding to the wake-up word put.
Preferably, vector location specifically includes:
Subelement is cascaded, for the MFCC features of the voice to be verified of caching to be cascaded;
Subelement is obtained, for the MFCC features after cascade to be inputted in preset deep neural network model, acquisition is treated The i-vector vectors of voice are verified, and are updated to preset deep neural network model newly by the MFCC features after cascade Preset deep neural network model.
Preferably, comparing unit specifically includes:
Coupling subelement, the i-vector preset for the i-vector vector sums by voice to be verified vectors are just being returned Change is handled, and the i-vector vectors that the i-vector vector sums of the voice to be verified after positive naturalization is handled are preset pass through outline line Property distinction analysis model be compared, get and compare the matching fraction that draws;
Subelement is compensated, for matching fraction to be added into migration fraction, obtains new matching fraction;
Judgment sub-unit, for obtaining the authority credentials of voice to be verified according to new matching fraction, judge voice to be verified Authority credentials whether be more than or equal to authority credentials corresponding to preset wake-up word corresponding to voice to be verified, if so, then triggering is held Row subelement;
Subelement is performed, for performing operation corresponding to preset wake-up word corresponding with voice to be verified.
Preferably, comparing unit also includes:Prompt subelement;
Judgment sub-unit is specifically used for the authority credentials that voice to be verified is obtained according to new matching fraction, judges language to be verified Whether the authority credentials of sound is more than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then triggering Subelement is performed, if it is not, then triggering prompting subelement;
Subelement is prompted, the prompting for sending permission deficiency.
As can be seen from the above technical solutions, the present invention has advantages below:
The invention provides a kind of voice awakening method of combination Application on Voiceprint Recognition, including:S1:Voice to be verified is received to go forward side by side Row feature extraction, obtain the MFCC features of voice to be verified;S2:The MFCC features of voice to be verified in predetermined period are carried out Caching;S3:Whether the content that voice to be verified is judged according to the MFCC features of the voice to be verified of caching is preset wake-up word, If so, then perform step S4;S4:The MFCC features of the voice to be verified of caching are inputted to preset deep neural network model In, the i-vector for obtaining voice to be verified is vectorial;S5:By the i-vector that the i-vector of voice to be verified is vectorial and preset Vector is compared, and the matching fraction drawn according to comparison obtains the authority credentials of voice to be verified, judges the power of voice to be verified Whether limit value is more than or equal to authority credentials corresponding to preset wake-up word corresponding to voice to be verified, if so, then perform with it is to be tested Demonstrate,prove and operated corresponding to wake-up word preset corresponding to voice.
The MFCC features that voice to be verified is first passed through in the present invention judge whether the content in voice to be verified is preset Word is waken up, if it is, extracting i-vector vectors by preset deep neural network model, is entered by i-vector vectors Row Application on Voiceprint Recognition confirms speaker's identity, obtains the authority credentials of voice to be verified, and according to the authority credentials of speaker with it is to be verified Preset wake-up comparative result of authority credentials corresponding to word judges whether speaker has enough authorities corresponding to voice, if Have, then perform and operated corresponding to preset wake-up word corresponding with voice to be verified, the voice for solving Current electronic product is called out Function of waking up lacks subscription authentication function, can not carry out subscription authentication to realize equipment operation that is more complicated and needing user right Technical problem.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, without having to pay creative labor, may be used also To obtain other accompanying drawings according to these accompanying drawings.
Fig. 1 is a kind of stream of one embodiment of the voice awakening method of combination Application on Voiceprint Recognition provided in an embodiment of the present invention Journey schematic diagram;
Fig. 2 is a kind of another embodiment of the voice awakening method of combination Application on Voiceprint Recognition provided in an embodiment of the present invention Schematic flow sheet;
Fig. 3 is a kind of knot of one embodiment of the voice Rouser of combination Application on Voiceprint Recognition provided in an embodiment of the present invention Structure schematic diagram.
Embodiment
The embodiments of the invention provide a kind of voice awakening method of combination Application on Voiceprint Recognition and device, solves Current electronic The voice arousal function of product lacks subscription authentication function, can not carry out subscription authentication to realize more complicated and need user to weigh The technical problem of the equipment operation of limit.
To enable goal of the invention, feature, the advantage of the present invention more obvious and understandable, below in conjunction with the present invention Accompanying drawing in embodiment, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that disclosed below Embodiment be only part of the embodiment of the present invention, and not all embodiment.Based on the embodiment in the present invention, this area All other embodiment that those of ordinary skill is obtained under the premise of creative work is not made, belongs to protection of the present invention Scope.
Referring to Fig. 1, the embodiments of the invention provide a kind of implementation of one of voice awakening method of combination Application on Voiceprint Recognition Example, including:
Step 101:Receive voice to be verified and carry out feature extraction, obtain the MFCC features of voice to be verified;
It should be noted that if desired being waken up to equipment and Application on Voiceprint Recognition is, it is necessary to receive voice to be verified and carry out Feature extraction, obtain the MFCC features of voice to be verified.
Step 102:The MFCC features of voice to be verified in predetermined period are cached;
It should be noted that in order to save the memory space of electronic equipment, only to the voice to be verified in predetermined period MFCC features are stored, and predetermined period could be arranged to nearest three seconds, as only to the voice to be verified in nearest three seconds MFCC features are stored.
Step 103:Whether the content that voice to be verified is judged according to the MFCC features of the voice to be verified of caching is preset Wake-up word, if so, then performing step S4;
It should be noted that when voice to be verified is preset wake-up word, then follow-up step is performed, if language to be verified The non-preset wake-up word of sound, then electronic equipment continue to keep standby.
Step 104:The MFCC features of the voice to be verified of caching are inputted in preset deep neural network model, obtained The i-vector vectors of voice to be verified;
It should be noted that when voice to be verified is preset wake-up word, then according to the MFCC features of voice to be verified Extract the i-vector vectors of voice to be verified;
Voiceprint Recognition System is come the system of automatic identification speaker's identity, sound groove recognition technology in e category according to the speciality of voice In one kind of biometric authentication technology, speaker's identity is verified using voice, this technology has preferable convenience, steady The features such as qualitative, measurability and security, as a kind of contactless collection and identification technology, the procurement cost of vocal print compared with It is low, obtain it is convenient and using simple, before having huge applications in fields such as bank, social security, public security, smart home and mobile payments Scape.
Step 105:The preset i-vector vectors of the i-vector vector sums of voice to be verified are compared, according to than The authority credentials of voice to be verified is obtained to the matching fraction drawn, judges whether the authority credentials of voice to be verified is more than or equal to and treats Authority credentials corresponding to wake-up word preset corresponding to voice is verified, if so, then performing step 106;
It should be noted that can by vectorial are compared with preset i-vector vectors of the i-vector of voice to be verified To obtain matching fraction, and the identity of speaker can confirm that according to matching fraction, confirming the identity of speaker can get Whether the authority credentials of speaker, i.e., the authority credentials of voice to be verified, the authority credentials of voice to be verified are more than or equal to language to be verified Preset wake-up comparison of authority credentials corresponding to word corresponding to sound can decide whether to perform next step.
Step 106:Perform and operated corresponding to preset wake-up word corresponding with voice to be verified.
It should be noted that when the authority credentials of voice to be verified is more than or equal to wake-up preset corresponding to voice to be verified Corresponding to word during authority credentials, then perform corresponding to preset wake-up word corresponding with voice to be verified and operate.
The MFCC features that voice to be verified is first passed through in the present embodiment judge whether the content in voice to be verified is preset Wake-up word, if it is, extracting i-vector vectors by preset deep neural network model, pass through i-vector vectors Carry out Application on Voiceprint Recognition and confirm speaker's identity, obtain the authority credentials of voice to be verified, and according to the authority credentials of speaker with it is to be tested Preset wake-up comparative result of authority credentials corresponding to word judges whether speaker has enough authorities corresponding to card voice, if Have, then perform and operated corresponding to preset wake-up word corresponding with voice to be verified, the voice for solving Current electronic product is called out Function of waking up lacks subscription authentication function, can not carry out subscription authentication to realize equipment operation that is more complicated and needing user right Technical problem.
It is a kind of one embodiment of the voice awakening method of combination Application on Voiceprint Recognition provided in an embodiment of the present invention above, with It is a kind of another embodiment of the voice awakening method of combination Application on Voiceprint Recognition provided in an embodiment of the present invention down.
Referring to Fig. 2, the embodiments of the invention provide a kind of another reality of the voice awakening method of combination Application on Voiceprint Recognition Example is applied, including:
Step 201:Receive voice to be verified and carry out feature extraction, obtain the MFCC features of voice to be verified;
It should be noted that need to such as obtain more high-dimensional MFCC features, then can by carrying out a point window to discrete signal, Fourier transform is carried out after dividing window, and increases the number of wave filter group, then calculates mel cepstrum coefficients and can be obtained by more higher-dimension MFCC coefficients.
Step 202:The MFCC features of voice to be verified in predetermined period are cached;
It should be noted that in order to save the memory space of electronic equipment, although electronic equipment, which is in, continues listening state, But only the MFCC features of the voice to be verified in predetermined period are stored, predetermined period could be arranged to nearest three seconds, As only stored to the MFCC features of the voice to be verified in nearest three seconds.
Step 203:Whether the content that voice to be verified is judged according to the MFCC features of the voice to be verified of caching is preset Wake-up word, if so, then perform step 204;
It should be noted that when voice to be verified is preset wake-up word, then wake-up device is switched to just by holding state Normal working condition simultaneously performs follow-up step, if the non-preset wake-up word of voice to be verified, electronic equipment continue to keep standby State.
Step 204:The MFCC features of the voice to be verified of caching are cascaded;
It should be noted that it is that the MFCC features of the voice to be verified of caching are carried out into cascade to refer to time upper adjacent language Sound frame coefficient vector is stitched together as a longer vector.
Step 205:MFCC features after cascade are inputted in preset deep neural network model, obtain voice to be verified I-vector vectors, and by the MFCC features after cascade preset deep neural network model is updated to new preset Deep neural network model;
It should be noted that traditional i-vector vectors are to be extracted by mixed Gauss model (GMM), using big Speaker's speech data that the speaker of amount is unrelated, channel is unrelated, is pre-processed, and extracts MFCC features, general for training Background model UBM and global disparity matrix T, the algorithm for training UBM are expectation maximization (Expectation Maximization,EM);
Training finishes, and preserves universal background model and global disparity matrix, in registration and test phase, to the every of speaker Duan Yuyin, i-vector vectors corresponding to every section of voice are extracted using formula (1):
Ms=mu+Tωs
Wherein, muIt is UBM Gaussian mean super vector;T is the global disparity matrix of low-rank, for characterizing global disparity sky Between information;ωsIt is the global disparity factor.
Using the i-vector vector extracting methods based on deep neural network (DNN) in the present embodiment, language is used The deep neural network (DNN) of sound identification, regards the MFCC features after cascade as phoneme model (Tri-phone) and is input to it Before in the DNN networks that train, Short Time Speech frame is classified according to posterior probability, each frame and corresponding posterior probability can Be used to train a new UBM, UBM can be so trained by way of supervised learning, instead of in traditional UBM training Unsupervised EM algorithms;
I-vector vectors are extracted by deep neural network and possess high accuracy rate, noise robustness and channel robustness, adaptation The advantages that various texts.
Step 206:The preset i-vector vectors of the i-vector vector sums of voice to be verified are subjected to positive naturalization processing, The i-vector vectors that the i-vector vector sums of voice to be verified after positive naturalization is handled are preset are linearly distinguished by outline Property analysis model be compared, get and compare the matching fraction that draws;
It should be noted that positive naturalization processing refers to i-vector vectorial and preset the i-vector of voice to be verified Vector is changed into identical length;
Preset i-vector vectors can have multiple, represent different users respectively, and such as A electronic equipment, but A exists A, B, C, D and E i-vector vectors are stored in electronic equipment, assign A, B, C, D the authority credentials different with E;
It is compared the i-vector of voice to be verified is vectorial, can gets with each preset i-vector vectors Different matching fractions.
Step 207:Matching fraction is added into migration fraction, obtains new matching fraction;
It should be noted that because the extraneous factor such as noise can to match fraction shifts, be according to extraneous circumstance Compensated by preset migration fraction, obtain new matching fraction.
Step 208:The authority credentials of voice to be verified is obtained according to new matching fraction, judges the authority credentials of voice to be verified Whether authority credentials corresponding to wake-up word preset corresponding to voice to be verified is more than or equal to, if so, step 209 is then performed, if It is no, then perform step 210;
It should be noted that according to each new matching fraction, the identity of speaker is can confirm that, such as voice to be verified I-vector vectors and E preset i-vector Vectors matching fraction highests and be more than predetermined threshold value, then judge to speak artificially E, gets E authority credentials, and judges whether E authority credentials is more than or equal to wake-up word pair preset corresponding to voice to be verified The authority credentials answered, the wake-up word said such as E are to pay, and the authority credentials for paying needs is arranged to 5, but E authority credentials only has 3, The step of so can determining to perform according to this judged result;
It if speaker is F, can also obtain matching fraction, but match fraction and be less than predetermined threshold value, therefore F does not possess Any authority credentials.
Step 209:Perform and operated corresponding to preset wake-up word corresponding with voice to be verified;
It should be noted that if E authority credentials is 5, then is equal to pay the authority credentials needed, then electronic equipment is held Row delivery operation corresponding with paying this wake-up word.
Step 210:The prompting of sending permission deficiency.
If it should be noted that E authority credentials be 3, less than pay need authority credentials, then by word, light or The prompting of the mode such as person's loudspeaker sending permission deficiency.
Sound groove recognition technology in e is as a kind of identity validation technology of Remote Non-touch, with reference to across media interactive communications with answering With service platform, there is huge applications prospect in fields such as bank, social security, public security, smart home, mobile payments, by vocal print Identification technology with voice arousal function be combined improve user interactive voice experience, veritably contact-free can realize that identity is recognized Card;
And traditional mixed Gauss model is instead of in sound groove recognition technology in e with deep neural network, there is accuracy rate High, noise robustness and channel robustness, the advantages that various texts is adapted to, and support cross-platform and across channel deployment, had and calculate speed The advantages such as fast, low in energy consumption, occupying system resources are few are spent, can easily be deployed on mobile electronic device, there is wide application Prospect;
The MFCC features that voice to be verified is first passed through in the present embodiment judge whether the content in voice to be verified is preset Wake-up word;If it is, extracting i-vector vectors by preset deep neural network model, pass through i-vector vectors Carry out Application on Voiceprint Recognition and confirm speaker's identity, obtain the authority credentials of voice to be verified, and according to the authority credentials of speaker with it is to be tested Preset wake-up comparative result of authority credentials corresponding to word judges whether speaker has enough authorities corresponding to card voice, if Have, then perform and operated corresponding to preset wake-up word corresponding with voice to be verified, the voice for solving Current electronic product is called out Function of waking up lacks subscription authentication function, can not carry out subscription authentication to realize equipment operation that is more complicated and needing user right Technical problem.
It is a kind of another embodiment of the voice awakening method of combination Application on Voiceprint Recognition provided in an embodiment of the present invention above, It is a kind of one embodiment of the voice Rouser of combination Application on Voiceprint Recognition provided in an embodiment of the present invention below.
Referring to Fig. 3, the embodiments of the invention provide a kind of implementation of one of voice Rouser of combination Application on Voiceprint Recognition Example, including:
Feature unit 301, for receiving voice to be verified and carrying out feature extraction, the MFCC for obtaining voice to be verified is special Sign;
Buffer unit 302, for being cached to the MFCC features of the voice to be verified in predetermined period;
Wakeup unit 303, the MFCC features for the voice to be verified according to caching judge that the content of voice to be verified is No is preset wake-up word, if so, then performing step S4;
Vector location 304, for the MFCC features of the voice to be verified of caching to be inputted to preset deep neural network mould In type, the i-vector vectors of voice to be verified are obtained;
Comparing unit 305, the i-vector vector preset for the i-vector vector sums by voice to be verified are compared It is right, according to the authority credentials for comparing the matching fraction acquisition voice to be verified drawn, judge whether the authority credentials of voice to be verified is big In or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then performing corresponding with voice to be verified Preset wake-up word corresponding to operate.
Preferably, vector location 304 specifically includes:
Subelement 3041 is cascaded, for the MFCC features of the voice to be verified of caching to be cascaded;
Subelement 3042 is obtained, for the MFCC features after cascade to be inputted in preset deep neural network model, is obtained The i-vector vectors of voice to be verified are taken, and are updated preset deep neural network model by the MFCC features after cascade For new preset deep neural network model.
Preferably, comparing unit 305 specifically includes:
Coupling subelement 3051, the i-vector preset for the i-vector vector sums by voice to be verified vectors are carried out Positive naturalization processing, the preset i-vector vectors of the i-vector vector sums of the voice to be verified after positive naturalization processing are passed through general It is compared by linear distinction analysis model, gets the matching fraction for comparing and drawing;
Subelement 3052 is compensated, for matching fraction to be added into migration fraction, obtains new matching fraction;
Judgment sub-unit 3053, for obtaining the authority credentials of voice to be verified according to new matching fraction, judge to be verified Whether the authority credentials of voice is more than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then touching Hair performs subelement 3054;
Subelement 3054 is performed, for performing operation corresponding to preset wake-up word corresponding with voice to be verified.
Preferably, comparing unit 305 also includes:Prompt subelement 3055;
Judgment sub-unit 3053 is specifically used for the authority credentials that voice to be verified is obtained according to new matching fraction, judges to be tested Whether the authority credentials for demonstrate,proving voice is more than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then Triggering performs subelement 3054, if it is not, then triggering prompting subelement 3055;
Subelement 3055 is prompted, the prompting for sending permission deficiency.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, the corresponding process in preceding method embodiment is may be referred to, will not be repeated here.
In several embodiments provided herein, it should be understood that disclosed apparatus and method, it can be passed through Its mode is realized.For example, device embodiment described above is only schematical, for example, the division of the unit, only Only a kind of division of logic function, there can be other dividing mode when actually realizing, such as multiple units or component can be tied Another system is closed or is desirably integrated into, or some features can be ignored, or do not perform.It is another, it is shown or discussed Mutual coupling or direct-coupling or communication connection can be the INDIRECT COUPLINGs or logical by some interfaces, device or unit Letter connection, can be electrical, mechanical or other forms.
The unit illustrated as separating component can be or may not be physically separate, show as unit The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On NE.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list Member can both be realized in the form of hardware, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and is used as independent production marketing or use When, it can be stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially The part to be contributed in other words to prior art or all or part of the technical scheme can be in the form of software products Embody, the computer software product is stored in a storage medium, including some instructions are causing a computer Equipment (can be personal computer, server, or network equipment etc.) performs the complete of each embodiment methods described of the present invention Portion or part steps.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey The medium of sequence code.
Described above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to before Embodiment is stated the present invention is described in detail, it will be understood by those within the art that:It still can be to preceding State the technical scheme described in each embodiment to modify, or equivalent substitution is carried out to which part technical characteristic;And these Modification is replaced, and the essence of appropriate technical solution is departed from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (8)

  1. A kind of 1. voice awakening method of combination Application on Voiceprint Recognition, it is characterised in that including:
    S1:Receive voice to be verified and carry out feature extraction, obtain the MFCC features of voice to be verified;
    S2:The MFCC features of voice to be verified in predetermined period are cached;
    S3:Whether the content that voice to be verified is judged according to the MFCC features of the voice to be verified of caching is preset wake-up word, If so, then perform step S4;
    S4:The MFCC features of the voice to be verified of caching are inputted in preset deep neural network model, obtain language to be verified The i-vector vectors of sound;
    S5:The preset i-vector vectors of the i-vector vector sums of voice to be verified are compared, drawn according to comparison The authority credentials that fraction obtains voice to be verified is matched, judges whether the authority credentials of voice to be verified is more than or equal to voice to be verified Authority credentials corresponding to corresponding preset wake-up word, if so, then perform with voice to be verified corresponding to preset wake-up word it is corresponding Operation.
  2. 2. the voice awakening method of a kind of combination Application on Voiceprint Recognition according to claim 1, it is characterised in that step S4 is specific Including:
    S41:The MFCC features of the voice to be verified of caching are cascaded;
    S42:MFCC features after cascade are inputted in preset deep neural network model, obtain the i- of voice to be verified Vector vectors, and preset deep neural network model is updated to by the MFCC features after cascade by new preset depth Neural network model.
  3. 3. the voice awakening method of a kind of combination Application on Voiceprint Recognition according to claim 1, it is characterised in that step S5 is specific Including:
    S51:The preset i-vector vectors of the i-vector vector sums of voice to be verified are subjected to positive naturalization processing, by positive naturalization The i-vector vectors that the i-vector vector sums of voice to be verified after processing are preset analyze mould by the linear distinction of outline Type is compared, and gets the matching fraction for comparing and drawing;
    S52:Matching fraction is added into migration fraction, obtains new matching fraction;
    S53:The authority credentials of voice to be verified is obtained according to new matching fraction, judges whether the authority credentials of voice to be verified is more than Or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then performing step S54;
    S54:Perform and operated corresponding to preset wake-up word corresponding with voice to be verified.
  4. 4. the voice awakening method of a kind of combination Application on Voiceprint Recognition according to claim 3, it is characterised in that step S5 is also wrapped Include:Step S55;
    Step S53 is specifically included:The authority credentials of voice to be verified is obtained according to new matching fraction, judges the power of voice to be verified Whether limit value is more than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then performing step S54, if it is not, then performing step S55;
    S55:The prompting of sending permission deficiency.
  5. A kind of 5. voice Rouser of combination Application on Voiceprint Recognition, it is characterised in that including:
    Feature unit, for receiving voice to be verified and carrying out feature extraction, obtain the MFCC features of voice to be verified;
    Buffer unit, for being cached to the MFCC features of the voice to be verified in predetermined period;
    Wakeup unit, the MFCC features for the voice to be verified according to caching judge whether the content of voice to be verified is preset Wake-up word, if so, then performing step S4;
    Vector location, for the MFCC features of the voice to be verified of caching to be inputted in preset deep neural network model, obtain Take the i-vector vectors of voice to be verified;
    Comparing unit, the i-vector vector preset for the i-vector vector sums by voice to be verified are compared, according to The authority credentials that the matching fraction drawn obtains voice to be verified is compared, judges whether the authority credentials of voice to be verified is more than or equal to Authority credentials corresponding to preset wake-up word corresponding to voice to be verified, if so, then performing corresponding with voice to be verified preset Wake up and operated corresponding to word.
  6. 6. the voice Rouser of a kind of combination Application on Voiceprint Recognition according to claim 5, it is characterised in that vector location has Body includes:
    Subelement is cascaded, for the MFCC features of the voice to be verified of caching to be cascaded;
    Subelement is obtained, for the MFCC features after cascade to be inputted in preset deep neural network model, is obtained to be verified The i-vector vectors of voice, and by the MFCC features after cascade be updated to preset deep neural network model new pre- The deep neural network model put.
  7. 7. the voice Rouser of a kind of combination Application on Voiceprint Recognition according to claim 5, it is characterised in that comparing unit has Body includes:
    Coupling subelement, the i-vector preset for the i-vector vector sums by voice to be verified vectors are carried out at positive naturalization Reason, the i-vector vectors that the i-vector vector sums of the voice to be verified after positive naturalization is handled are preset pass through outline linear zone Point property analysis model is compared, and gets the matching fraction for comparing and drawing;
    Subelement is compensated, for matching fraction to be added into migration fraction, obtains new matching fraction;
    Judgment sub-unit, for obtaining the authority credentials of voice to be verified according to new matching fraction, judge the power of voice to be verified Whether limit value is more than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then triggering performs son Unit;
    Subelement is performed, for performing operation corresponding to preset wake-up word corresponding with voice to be verified.
  8. 8. the voice Rouser of a kind of combination Application on Voiceprint Recognition according to claim 7, it is characterised in that comparing unit is also Including:Prompt subelement;
    Judgment sub-unit is specifically used for the authority credentials that voice to be verified is obtained according to new matching fraction, judges voice to be verified Whether authority credentials is more than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then triggering performs Subelement, if it is not, then triggering prompting subelement;
    Subelement is prompted, the prompting for sending permission deficiency.
CN201711145883.5A 2017-11-17 2017-11-17 The voice awakening method and device of a kind of combination Application on Voiceprint Recognition Pending CN107886957A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711145883.5A CN107886957A (en) 2017-11-17 2017-11-17 The voice awakening method and device of a kind of combination Application on Voiceprint Recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711145883.5A CN107886957A (en) 2017-11-17 2017-11-17 The voice awakening method and device of a kind of combination Application on Voiceprint Recognition

Publications (1)

Publication Number Publication Date
CN107886957A true CN107886957A (en) 2018-04-06

Family

ID=61777214

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711145883.5A Pending CN107886957A (en) 2017-11-17 2017-11-17 The voice awakening method and device of a kind of combination Application on Voiceprint Recognition

Country Status (1)

Country Link
CN (1) CN107886957A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108646902A (en) * 2018-05-15 2018-10-12 广州势必可赢网络科技有限公司 A kind of control method and device of standby enable signal
CN108665895A (en) * 2018-05-03 2018-10-16 百度在线网络技术(北京)有限公司 Methods, devices and systems for handling information
CN108766446A (en) * 2018-04-18 2018-11-06 上海问之信息科技有限公司 Method for recognizing sound-groove, device, storage medium and speaker
CN108877790A (en) * 2018-05-21 2018-11-23 江西午诺科技有限公司 Speaker control method, device, readable storage medium storing program for executing and mobile terminal
CN108962260A (en) * 2018-06-25 2018-12-07 福来宝电子(深圳)有限公司 A kind of more human lives enable audio recognition method, system and storage medium
CN109215646A (en) * 2018-08-15 2019-01-15 北京百度网讯科技有限公司 Voice interaction processing method, device, computer equipment and storage medium
CN109524011A (en) * 2018-10-22 2019-03-26 四川虹美智能科技有限公司 A kind of refrigerator awakening method and device based on Application on Voiceprint Recognition
CN109887511A (en) * 2019-04-24 2019-06-14 武汉水象电子科技有限公司 A kind of voice wake-up optimization method based on cascade DNN
CN109978145A (en) * 2019-03-29 2019-07-05 联想(北京)有限公司 A kind of processing method and processing device
CN110060693A (en) * 2019-04-16 2019-07-26 Oppo广东移动通信有限公司 Model training method, device, electronic equipment and storage medium
CN110517686A (en) * 2019-09-26 2019-11-29 合肥飞尔智能科技有限公司 Intelligent sound box end voice opens the method and system of application
CN110574103A (en) * 2018-06-29 2019-12-13 华为技术有限公司 Voice control method, wearable device and terminal
CN110853632A (en) * 2018-08-21 2020-02-28 蔚来汽车有限公司 Voice recognition method based on voiceprint information and intelligent interaction equipment
CN111192574A (en) * 2018-11-14 2020-05-22 奇酷互联网络科技(深圳)有限公司 Intelligent voice interaction method, mobile terminal and computer readable storage medium
WO2020102991A1 (en) * 2018-11-20 2020-05-28 深圳市欢太科技有限公司 Method and apparatus for waking up device, storage medium and electronic device
CN111243603A (en) * 2020-01-09 2020-06-05 厦门快商通科技股份有限公司 Voiceprint recognition method, system, mobile terminal and storage medium
CN111640433A (en) * 2020-06-01 2020-09-08 珠海格力电器股份有限公司 Voice interaction method, storage medium, electronic equipment and intelligent home system
CN111816193A (en) * 2020-08-12 2020-10-23 深圳市友杰智新科技有限公司 Voice awakening method and device based on multi-segment network and storage medium
CN111862965A (en) * 2019-04-28 2020-10-30 阿里巴巴集团控股有限公司 Awakening processing method and device, intelligent sound box and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104658533A (en) * 2013-11-20 2015-05-27 中兴通讯股份有限公司 Terminal unlocking method and device as well as terminal
US20150245154A1 (en) * 2013-07-11 2015-08-27 Intel Corporation Mechanism and apparatus for seamless voice wake and speaker verification
CN106448684A (en) * 2016-11-16 2017-02-22 北京大学深圳研究生院 Deep-belief-network-characteristic-vector-based channel-robust voiceprint recognition system
CN106601258A (en) * 2016-12-12 2017-04-26 广东顺德中山大学卡内基梅隆大学国际联合研究院 Speaker identification method capable of information channel compensation based on improved LSDA algorithm
CN106601229A (en) * 2016-11-15 2017-04-26 华南理工大学 Voice awakening method based on soc chip
CN107146615A (en) * 2017-05-16 2017-09-08 南京理工大学 Audio recognition method and system based on the secondary identification of Matching Model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150245154A1 (en) * 2013-07-11 2015-08-27 Intel Corporation Mechanism and apparatus for seamless voice wake and speaker verification
CN104658533A (en) * 2013-11-20 2015-05-27 中兴通讯股份有限公司 Terminal unlocking method and device as well as terminal
CN106601229A (en) * 2016-11-15 2017-04-26 华南理工大学 Voice awakening method based on soc chip
CN106448684A (en) * 2016-11-16 2017-02-22 北京大学深圳研究生院 Deep-belief-network-characteristic-vector-based channel-robust voiceprint recognition system
CN106601258A (en) * 2016-12-12 2017-04-26 广东顺德中山大学卡内基梅隆大学国际联合研究院 Speaker identification method capable of information channel compensation based on improved LSDA algorithm
CN107146615A (en) * 2017-05-16 2017-09-08 南京理工大学 Audio recognition method and system based on the secondary identification of Matching Model

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108766446A (en) * 2018-04-18 2018-11-06 上海问之信息科技有限公司 Method for recognizing sound-groove, device, storage medium and speaker
CN108665895A (en) * 2018-05-03 2018-10-16 百度在线网络技术(北京)有限公司 Methods, devices and systems for handling information
CN108665895B (en) * 2018-05-03 2021-05-25 百度在线网络技术(北京)有限公司 Method, device and system for processing information
CN108646902A (en) * 2018-05-15 2018-10-12 广州势必可赢网络科技有限公司 A kind of control method and device of standby enable signal
CN108877790A (en) * 2018-05-21 2018-11-23 江西午诺科技有限公司 Speaker control method, device, readable storage medium storing program for executing and mobile terminal
CN108962260A (en) * 2018-06-25 2018-12-07 福来宝电子(深圳)有限公司 A kind of more human lives enable audio recognition method, system and storage medium
CN110574103B (en) * 2018-06-29 2020-10-23 华为技术有限公司 Voice control method, wearable device and terminal
CN110574103A (en) * 2018-06-29 2019-12-13 华为技术有限公司 Voice control method, wearable device and terminal
CN109215646A (en) * 2018-08-15 2019-01-15 北京百度网讯科技有限公司 Voice interaction processing method, device, computer equipment and storage medium
CN110853632A (en) * 2018-08-21 2020-02-28 蔚来汽车有限公司 Voice recognition method based on voiceprint information and intelligent interaction equipment
CN109524011A (en) * 2018-10-22 2019-03-26 四川虹美智能科技有限公司 A kind of refrigerator awakening method and device based on Application on Voiceprint Recognition
CN111192574A (en) * 2018-11-14 2020-05-22 奇酷互联网络科技(深圳)有限公司 Intelligent voice interaction method, mobile terminal and computer readable storage medium
WO2020102991A1 (en) * 2018-11-20 2020-05-28 深圳市欢太科技有限公司 Method and apparatus for waking up device, storage medium and electronic device
CN109978145A (en) * 2019-03-29 2019-07-05 联想(北京)有限公司 A kind of processing method and processing device
CN109978145B (en) * 2019-03-29 2021-09-14 联想(北京)有限公司 Processing method and device
CN110060693A (en) * 2019-04-16 2019-07-26 Oppo广东移动通信有限公司 Model training method, device, electronic equipment and storage medium
CN109887511A (en) * 2019-04-24 2019-06-14 武汉水象电子科技有限公司 A kind of voice wake-up optimization method based on cascade DNN
CN111862965A (en) * 2019-04-28 2020-10-30 阿里巴巴集团控股有限公司 Awakening processing method and device, intelligent sound box and electronic equipment
CN110517686A (en) * 2019-09-26 2019-11-29 合肥飞尔智能科技有限公司 Intelligent sound box end voice opens the method and system of application
CN111243603A (en) * 2020-01-09 2020-06-05 厦门快商通科技股份有限公司 Voiceprint recognition method, system, mobile terminal and storage medium
CN111640433A (en) * 2020-06-01 2020-09-08 珠海格力电器股份有限公司 Voice interaction method, storage medium, electronic equipment and intelligent home system
CN111816193A (en) * 2020-08-12 2020-10-23 深圳市友杰智新科技有限公司 Voice awakening method and device based on multi-segment network and storage medium
CN111816193B (en) * 2020-08-12 2020-12-15 深圳市友杰智新科技有限公司 Voice awakening method and device based on multi-segment network and storage medium

Similar Documents

Publication Publication Date Title
CN107886957A (en) The voice awakening method and device of a kind of combination Application on Voiceprint Recognition
JP7110292B2 (en) Neural network for speaker verification
US9940935B2 (en) Method and device for voiceprint recognition
CN110164452A (en) A kind of method of Application on Voiceprint Recognition, the method for model training and server
CN105702263B (en) Speech playback detection method and device
CN106373575B (en) User voiceprint model construction method, device and system
Chavan et al. An overview of speech recognition using HMM
CN107886943A (en) A kind of method for recognizing sound-groove and device
CN106898355B (en) Speaker identification method based on secondary modeling
WO2014114116A1 (en) Method and system for voiceprint recognition
EP3989217B1 (en) Method for detecting an audio adversarial attack with respect to a voice input processed by an automatic speech recognition system, corresponding device, computer program product and computer-readable carrier medium
CN111199741A (en) Voiceprint identification method, voiceprint verification method, voiceprint identification device, computing device and medium
CN106709804A (en) Interactive wealth planning consulting robot system
CN111462756B (en) Voiceprint recognition method and device, electronic equipment and storage medium
CN110517696A (en) A kind of offline Voiceprint Recognition System of implantable
CN109994118A (en) Speech cipher verification method, device, storage medium and computer equipment
CN105575385A (en) Voice cipher setting system and method, and sound cipher verification system and method
CN112992155B (en) Far-field voice speaker recognition method and device based on residual error neural network
CN112037772B (en) Response obligation detection method, system and device based on multiple modes
CN109872721A (en) Voice authentication method, information processing equipment and storage medium
CN113870840A (en) Voice recognition method, device and related equipment
CN108564374A (en) Payment authentication method, device, equipment and storage medium
Gao et al. VarASV: Enabling pitch-variable automatic speaker verification via multi-task learning
Ke et al. Speaker identification based on deep learning in FX iDeal system
Ren et al. A hybrid GMM speaker verification system for mobile devices in variable environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180406

RJ01 Rejection of invention patent application after publication