CN107886957A

CN107886957A - The voice awakening method and device of a kind of combination Application on Voiceprint Recognition

Info

Publication number: CN107886957A
Application number: CN201711145883.5A
Authority: CN
Inventors: 陈东鹏
Original assignee: Guangzhou Speakin Network Technology Co Ltd
Current assignee: Guangzhou Speakin Network Technology Co Ltd
Priority date: 2017-11-17
Filing date: 2017-11-17
Publication date: 2018-04-06

Abstract

The embodiment of the invention discloses a kind of voice awakening method of combination Application on Voiceprint Recognition and device.The MFCC features that voice to be verified is first passed through in the present invention judge whether the content in voice to be verified is preset wake-up word, if, i vector vectors are then extracted by preset deep neural network model, Application on Voiceprint Recognition is carried out by i vector vectors and confirms speaker's identity, obtain the authority credentials of voice to be verified, and the comparative result of authority credentials judges whether speaker has enough authorities corresponding to the preset wake-up word corresponding with voice to be verified of the authority credentials according to speaker, if, then perform corresponding to preset wake-up word corresponding with voice to be verified and operate, the voice arousal function for solving Current electronic product lacks subscription authentication function, subscription authentication can not be carried out to realize the technical problem of equipment operation that is more complicated and needing user right.

Description

The voice awakening method and device of a kind of combination Application on Voiceprint Recognition

Technical field

The present invention relates to the voice awakening method and device of vocal print application field, more particularly to a kind of combination Application on Voiceprint Recognition.

Background technology

Voice, which wakes up, refers to user by saying a default wake-up word to realize that electronic equipment is extensive from holding state Normal operating conditions is arrived again.Waken up by voice, user can be called out in the case where being inconvenient to click on electronic curtain by voice Function of waking up realizes the operation to electronic equipment.

But the current electronic product for possessing voice arousal function lacks the function to speaker's identity identification, can not sentence Disconnected speaker's identity, therefore authority can not be further opened, it can only realize that some are simple, the equipment without user right is grasped Make.

Therefore, the voice arousal function that result in Current electronic product lacks subscription authentication function, can not carry out user's mirror Weigh to realize the technical problem of equipment operation that is more complicated and needing user right.

The content of the invention

The invention provides a kind of voice awakening method of combination Application on Voiceprint Recognition and device, solves Current electronic product Voice arousal function lacks subscription authentication function, can not carry out subscription authentication to realize more complicated and need setting for user right The technical problem of standby operation.

The invention provides a kind of voice awakening method of combination Application on Voiceprint Recognition, including：

S1：Receive voice to be verified and carry out feature extraction, obtain the MFCC features of voice to be verified；

S2：The MFCC features of voice to be verified in predetermined period are cached；

S3：Whether the content that voice to be verified is judged according to the MFCC features of the voice to be verified of caching is preset wake-up Word, if so, then performing step S4；

S4：The MFCC features of the voice to be verified of caching are inputted in preset deep neural network model, obtained to be tested Demonstrate,prove the i-vector vectors of voice；

S5：The preset i-vector vectors of the i-vector vector sums of voice to be verified are compared, according to comparing Whether the matching fraction gone out obtains the authority credentials of voice to be verified, judge the authority credentials of voice to be verified more than or equal to be verified Authority credentials corresponding to preset wake-up word corresponding to voice, if so, then performing preset wake-up word corresponding with voice to be verified Corresponding operation.

Preferably, step S4 is specifically included：

S41：The MFCC features of the voice to be verified of caching are cascaded；

S42：MFCC features after cascade are inputted in preset deep neural network model, obtain the i- of voice to be verified Vector vectors, and preset deep neural network model is updated to by the MFCC features after cascade by new preset depth Neural network model.

Preferably, step S5 is specifically included：

S51：The preset i-vector vectors of the i-vector vector sums of voice to be verified are subjected to positive naturalization processing, will just The i-vector vectors that the i-vector vector sums of voice to be verified after naturalization processing are preset pass through outline linear distinction point Analysis model is compared, and gets the matching fraction for comparing and drawing；

S52：Matching fraction is added into migration fraction, obtains new matching fraction；

S53：The authority credentials of voice to be verified is obtained according to new matching fraction, judge voice to be verified authority credentials whether More than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then performing step S54；

S54：Perform and operated corresponding to preset wake-up word corresponding with voice to be verified.

Preferably, step S5 also includes：Step S55；

Step S53 is specifically included：The authority credentials of voice to be verified is obtained according to new matching fraction, judges voice to be verified Authority credentials whether be more than or equal to authority credentials corresponding to preset wake-up word corresponding to voice to be verified, if so, then performing step Rapid S54, if it is not, then performing step S55；

S55：The prompting of sending permission deficiency.

The invention provides a kind of voice Rouser of combination Application on Voiceprint Recognition, including：

Feature unit, for receiving voice to be verified and carrying out feature extraction, obtain the MFCC features of voice to be verified；

Buffer unit, for being cached to the MFCC features of the voice to be verified in predetermined period；

Wakeup unit, the MFCC features for the voice to be verified according to caching judge voice to be verified content whether be Preset wake-up word, if so, then performing step S4；

Vector location, for the MFCC features of the voice to be verified of caching to be inputted to preset deep neural network model In, the i-vector for obtaining voice to be verified is vectorial；

Comparing unit, the i-vector vector preset for the i-vector vector sums by voice to be verified are compared, The matching fraction drawn according to comparing obtains the authority credentials of voice to be verified, judge voice to be verified authority credentials whether be more than or Equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then performing corresponding with voice to be verified pre- Operated corresponding to the wake-up word put.

Preferably, vector location specifically includes：

Subelement is cascaded, for the MFCC features of the voice to be verified of caching to be cascaded；

Subelement is obtained, for the MFCC features after cascade to be inputted in preset deep neural network model, acquisition is treated The i-vector vectors of voice are verified, and are updated to preset deep neural network model newly by the MFCC features after cascade Preset deep neural network model.

Preferably, comparing unit specifically includes：

Coupling subelement, the i-vector preset for the i-vector vector sums by voice to be verified vectors are just being returned Change is handled, and the i-vector vectors that the i-vector vector sums of the voice to be verified after positive naturalization is handled are preset pass through outline line Property distinction analysis model be compared, get and compare the matching fraction that draws；

Subelement is compensated, for matching fraction to be added into migration fraction, obtains new matching fraction；

Judgment sub-unit, for obtaining the authority credentials of voice to be verified according to new matching fraction, judge voice to be verified Authority credentials whether be more than or equal to authority credentials corresponding to preset wake-up word corresponding to voice to be verified, if so, then triggering is held Row subelement；

Subelement is performed, for performing operation corresponding to preset wake-up word corresponding with voice to be verified.

Preferably, comparing unit also includes：Prompt subelement；

Judgment sub-unit is specifically used for the authority credentials that voice to be verified is obtained according to new matching fraction, judges language to be verified Whether the authority credentials of sound is more than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then triggering Subelement is performed, if it is not, then triggering prompting subelement；

Subelement is prompted, the prompting for sending permission deficiency.

As can be seen from the above technical solutions, the present invention has advantages below：

The invention provides a kind of voice awakening method of combination Application on Voiceprint Recognition, including：S1：Voice to be verified is received to go forward side by side Row feature extraction, obtain the MFCC features of voice to be verified；S2：The MFCC features of voice to be verified in predetermined period are carried out Caching；S3：Whether the content that voice to be verified is judged according to the MFCC features of the voice to be verified of caching is preset wake-up word, If so, then perform step S4；S4：The MFCC features of the voice to be verified of caching are inputted to preset deep neural network model In, the i-vector for obtaining voice to be verified is vectorial；S5：By the i-vector that the i-vector of voice to be verified is vectorial and preset Vector is compared, and the matching fraction drawn according to comparison obtains the authority credentials of voice to be verified, judges the power of voice to be verified Whether limit value is more than or equal to authority credentials corresponding to preset wake-up word corresponding to voice to be verified, if so, then perform with it is to be tested Demonstrate,prove and operated corresponding to wake-up word preset corresponding to voice.

The MFCC features that voice to be verified is first passed through in the present invention judge whether the content in voice to be verified is preset Word is waken up, if it is, extracting i-vector vectors by preset deep neural network model, is entered by i-vector vectors Row Application on Voiceprint Recognition confirms speaker's identity, obtains the authority credentials of voice to be verified, and according to the authority credentials of speaker with it is to be verified Preset wake-up comparative result of authority credentials corresponding to word judges whether speaker has enough authorities corresponding to voice, if Have, then perform and operated corresponding to preset wake-up word corresponding with voice to be verified, the voice for solving Current electronic product is called out Function of waking up lacks subscription authentication function, can not carry out subscription authentication to realize equipment operation that is more complicated and needing user right Technical problem.

Brief description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, without having to pay creative labor, may be used also To obtain other accompanying drawings according to these accompanying drawings.

Fig. 1 is a kind of stream of one embodiment of the voice awakening method of combination Application on Voiceprint Recognition provided in an embodiment of the present invention Journey schematic diagram；

Fig. 2 is a kind of another embodiment of the voice awakening method of combination Application on Voiceprint Recognition provided in an embodiment of the present invention Schematic flow sheet；

Fig. 3 is a kind of knot of one embodiment of the voice Rouser of combination Application on Voiceprint Recognition provided in an embodiment of the present invention Structure schematic diagram.

Embodiment

The embodiments of the invention provide a kind of voice awakening method of combination Application on Voiceprint Recognition and device, solves Current electronic The voice arousal function of product lacks subscription authentication function, can not carry out subscription authentication to realize more complicated and need user to weigh The technical problem of the equipment operation of limit.

To enable goal of the invention, feature, the advantage of the present invention more obvious and understandable, below in conjunction with the present invention Accompanying drawing in embodiment, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that disclosed below Embodiment be only part of the embodiment of the present invention, and not all embodiment.Based on the embodiment in the present invention, this area All other embodiment that those of ordinary skill is obtained under the premise of creative work is not made, belongs to protection of the present invention Scope.

Referring to Fig. 1, the embodiments of the invention provide a kind of implementation of one of voice awakening method of combination Application on Voiceprint Recognition Example, including：

Step 101：Receive voice to be verified and carry out feature extraction, obtain the MFCC features of voice to be verified；

It should be noted that if desired being waken up to equipment and Application on Voiceprint Recognition is, it is necessary to receive voice to be verified and carry out Feature extraction, obtain the MFCC features of voice to be verified.

Step 102：The MFCC features of voice to be verified in predetermined period are cached；

It should be noted that in order to save the memory space of electronic equipment, only to the voice to be verified in predetermined period MFCC features are stored, and predetermined period could be arranged to nearest three seconds, as only to the voice to be verified in nearest three seconds MFCC features are stored.

Step 103：Whether the content that voice to be verified is judged according to the MFCC features of the voice to be verified of caching is preset Wake-up word, if so, then performing step S4；

It should be noted that when voice to be verified is preset wake-up word, then follow-up step is performed, if language to be verified The non-preset wake-up word of sound, then electronic equipment continue to keep standby.

Step 104：The MFCC features of the voice to be verified of caching are inputted in preset deep neural network model, obtained The i-vector vectors of voice to be verified；

It should be noted that when voice to be verified is preset wake-up word, then according to the MFCC features of voice to be verified Extract the i-vector vectors of voice to be verified；

Voiceprint Recognition System is come the system of automatic identification speaker's identity, sound groove recognition technology in e category according to the speciality of voice In one kind of biometric authentication technology, speaker's identity is verified using voice, this technology has preferable convenience, steady The features such as qualitative, measurability and security, as a kind of contactless collection and identification technology, the procurement cost of vocal print compared with It is low, obtain it is convenient and using simple, before having huge applications in fields such as bank, social security, public security, smart home and mobile payments Scape.

Step 105：The preset i-vector vectors of the i-vector vector sums of voice to be verified are compared, according to than The authority credentials of voice to be verified is obtained to the matching fraction drawn, judges whether the authority credentials of voice to be verified is more than or equal to and treats Authority credentials corresponding to wake-up word preset corresponding to voice is verified, if so, then performing step 106；

It should be noted that can by vectorial are compared with preset i-vector vectors of the i-vector of voice to be verified To obtain matching fraction, and the identity of speaker can confirm that according to matching fraction, confirming the identity of speaker can get Whether the authority credentials of speaker, i.e., the authority credentials of voice to be verified, the authority credentials of voice to be verified are more than or equal to language to be verified Preset wake-up comparison of authority credentials corresponding to word corresponding to sound can decide whether to perform next step.

Step 106：Perform and operated corresponding to preset wake-up word corresponding with voice to be verified.

It should be noted that when the authority credentials of voice to be verified is more than or equal to wake-up preset corresponding to voice to be verified Corresponding to word during authority credentials, then perform corresponding to preset wake-up word corresponding with voice to be verified and operate.

The MFCC features that voice to be verified is first passed through in the present embodiment judge whether the content in voice to be verified is preset Wake-up word, if it is, extracting i-vector vectors by preset deep neural network model, pass through i-vector vectors Carry out Application on Voiceprint Recognition and confirm speaker's identity, obtain the authority credentials of voice to be verified, and according to the authority credentials of speaker with it is to be tested Preset wake-up comparative result of authority credentials corresponding to word judges whether speaker has enough authorities corresponding to card voice, if Have, then perform and operated corresponding to preset wake-up word corresponding with voice to be verified, the voice for solving Current electronic product is called out Function of waking up lacks subscription authentication function, can not carry out subscription authentication to realize equipment operation that is more complicated and needing user right Technical problem.

It is a kind of one embodiment of the voice awakening method of combination Application on Voiceprint Recognition provided in an embodiment of the present invention above, with It is a kind of another embodiment of the voice awakening method of combination Application on Voiceprint Recognition provided in an embodiment of the present invention down.

Referring to Fig. 2, the embodiments of the invention provide a kind of another reality of the voice awakening method of combination Application on Voiceprint Recognition Example is applied, including：

Step 201：Receive voice to be verified and carry out feature extraction, obtain the MFCC features of voice to be verified；

It should be noted that need to such as obtain more high-dimensional MFCC features, then can by carrying out a point window to discrete signal, Fourier transform is carried out after dividing window, and increases the number of wave filter group, then calculates mel cepstrum coefficients and can be obtained by more higher-dimension MFCC coefficients.

Step 202：The MFCC features of voice to be verified in predetermined period are cached；

It should be noted that in order to save the memory space of electronic equipment, although electronic equipment, which is in, continues listening state, But only the MFCC features of the voice to be verified in predetermined period are stored, predetermined period could be arranged to nearest three seconds, As only stored to the MFCC features of the voice to be verified in nearest three seconds.

Step 203：Whether the content that voice to be verified is judged according to the MFCC features of the voice to be verified of caching is preset Wake-up word, if so, then perform step 204；

It should be noted that when voice to be verified is preset wake-up word, then wake-up device is switched to just by holding state Normal working condition simultaneously performs follow-up step, if the non-preset wake-up word of voice to be verified, electronic equipment continue to keep standby State.

Step 204：The MFCC features of the voice to be verified of caching are cascaded；

It should be noted that it is that the MFCC features of the voice to be verified of caching are carried out into cascade to refer to time upper adjacent language Sound frame coefficient vector is stitched together as a longer vector.

Step 205：MFCC features after cascade are inputted in preset deep neural network model, obtain voice to be verified I-vector vectors, and by the MFCC features after cascade preset deep neural network model is updated to new preset Deep neural network model；

It should be noted that traditional i-vector vectors are to be extracted by mixed Gauss model (GMM), using big Speaker's speech data that the speaker of amount is unrelated, channel is unrelated, is pre-processed, and extracts MFCC features, general for training Background model UBM and global disparity matrix T, the algorithm for training UBM are expectation maximization (Expectation Maximization,EM)；

Training finishes, and preserves universal background model and global disparity matrix, in registration and test phase, to the every of speaker Duan Yuyin, i-vector vectors corresponding to every section of voice are extracted using formula (1)：

M_s=m_u+Tω_s

Wherein, m_uIt is UBM Gaussian mean super vector；T is the global disparity matrix of low-rank, for characterizing global disparity sky Between information；ω_sIt is the global disparity factor.

Using the i-vector vector extracting methods based on deep neural network (DNN) in the present embodiment, language is used The deep neural network (DNN) of sound identification, regards the MFCC features after cascade as phoneme model (Tri-phone) and is input to it Before in the DNN networks that train, Short Time Speech frame is classified according to posterior probability, each frame and corresponding posterior probability can Be used to train a new UBM, UBM can be so trained by way of supervised learning, instead of in traditional UBM training Unsupervised EM algorithms；

I-vector vectors are extracted by deep neural network and possess high accuracy rate, noise robustness and channel robustness, adaptation The advantages that various texts.

Step 206：The preset i-vector vectors of the i-vector vector sums of voice to be verified are subjected to positive naturalization processing, The i-vector vectors that the i-vector vector sums of voice to be verified after positive naturalization is handled are preset are linearly distinguished by outline Property analysis model be compared, get and compare the matching fraction that draws；

It should be noted that positive naturalization processing refers to i-vector vectorial and preset the i-vector of voice to be verified Vector is changed into identical length；

Preset i-vector vectors can have multiple, represent different users respectively, and such as A electronic equipment, but A exists A, B, C, D and E i-vector vectors are stored in electronic equipment, assign A, B, C, D the authority credentials different with E；

It is compared the i-vector of voice to be verified is vectorial, can gets with each preset i-vector vectors Different matching fractions.

Step 207：Matching fraction is added into migration fraction, obtains new matching fraction；

It should be noted that because the extraneous factor such as noise can to match fraction shifts, be according to extraneous circumstance Compensated by preset migration fraction, obtain new matching fraction.

Step 208：The authority credentials of voice to be verified is obtained according to new matching fraction, judges the authority credentials of voice to be verified Whether authority credentials corresponding to wake-up word preset corresponding to voice to be verified is more than or equal to, if so, step 209 is then performed, if It is no, then perform step 210；

It should be noted that according to each new matching fraction, the identity of speaker is can confirm that, such as voice to be verified I-vector vectors and E preset i-vector Vectors matching fraction highests and be more than predetermined threshold value, then judge to speak artificially E, gets E authority credentials, and judges whether E authority credentials is more than or equal to wake-up word pair preset corresponding to voice to be verified The authority credentials answered, the wake-up word said such as E are to pay, and the authority credentials for paying needs is arranged to 5, but E authority credentials only has 3, The step of so can determining to perform according to this judged result；

It if speaker is F, can also obtain matching fraction, but match fraction and be less than predetermined threshold value, therefore F does not possess Any authority credentials.

Step 209：Perform and operated corresponding to preset wake-up word corresponding with voice to be verified；

It should be noted that if E authority credentials is 5, then is equal to pay the authority credentials needed, then electronic equipment is held Row delivery operation corresponding with paying this wake-up word.

Step 210：The prompting of sending permission deficiency.

If it should be noted that E authority credentials be 3, less than pay need authority credentials, then by word, light or The prompting of the mode such as person's loudspeaker sending permission deficiency.

Sound groove recognition technology in e is as a kind of identity validation technology of Remote Non-touch, with reference to across media interactive communications with answering With service platform, there is huge applications prospect in fields such as bank, social security, public security, smart home, mobile payments, by vocal print Identification technology with voice arousal function be combined improve user interactive voice experience, veritably contact-free can realize that identity is recognized Card；

And traditional mixed Gauss model is instead of in sound groove recognition technology in e with deep neural network, there is accuracy rate High, noise robustness and channel robustness, the advantages that various texts is adapted to, and support cross-platform and across channel deployment, had and calculate speed The advantages such as fast, low in energy consumption, occupying system resources are few are spent, can easily be deployed on mobile electronic device, there is wide application Prospect；

The MFCC features that voice to be verified is first passed through in the present embodiment judge whether the content in voice to be verified is preset Wake-up word；If it is, extracting i-vector vectors by preset deep neural network model, pass through i-vector vectors Carry out Application on Voiceprint Recognition and confirm speaker's identity, obtain the authority credentials of voice to be verified, and according to the authority credentials of speaker with it is to be tested Preset wake-up comparative result of authority credentials corresponding to word judges whether speaker has enough authorities corresponding to card voice, if Have, then perform and operated corresponding to preset wake-up word corresponding with voice to be verified, the voice for solving Current electronic product is called out Function of waking up lacks subscription authentication function, can not carry out subscription authentication to realize equipment operation that is more complicated and needing user right Technical problem.

It is a kind of another embodiment of the voice awakening method of combination Application on Voiceprint Recognition provided in an embodiment of the present invention above, It is a kind of one embodiment of the voice Rouser of combination Application on Voiceprint Recognition provided in an embodiment of the present invention below.

Referring to Fig. 3, the embodiments of the invention provide a kind of implementation of one of voice Rouser of combination Application on Voiceprint Recognition Example, including：

Feature unit 301, for receiving voice to be verified and carrying out feature extraction, the MFCC for obtaining voice to be verified is special Sign；

Buffer unit 302, for being cached to the MFCC features of the voice to be verified in predetermined period；

Wakeup unit 303, the MFCC features for the voice to be verified according to caching judge that the content of voice to be verified is No is preset wake-up word, if so, then performing step S4；

Vector location 304, for the MFCC features of the voice to be verified of caching to be inputted to preset deep neural network mould In type, the i-vector vectors of voice to be verified are obtained；

Comparing unit 305, the i-vector vector preset for the i-vector vector sums by voice to be verified are compared It is right, according to the authority credentials for comparing the matching fraction acquisition voice to be verified drawn, judge whether the authority credentials of voice to be verified is big In or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then performing corresponding with voice to be verified Preset wake-up word corresponding to operate.

Preferably, vector location 304 specifically includes：

Subelement 3041 is cascaded, for the MFCC features of the voice to be verified of caching to be cascaded；

Subelement 3042 is obtained, for the MFCC features after cascade to be inputted in preset deep neural network model, is obtained The i-vector vectors of voice to be verified are taken, and are updated preset deep neural network model by the MFCC features after cascade For new preset deep neural network model.

Preferably, comparing unit 305 specifically includes：

Coupling subelement 3051, the i-vector preset for the i-vector vector sums by voice to be verified vectors are carried out Positive naturalization processing, the preset i-vector vectors of the i-vector vector sums of the voice to be verified after positive naturalization processing are passed through general It is compared by linear distinction analysis model, gets the matching fraction for comparing and drawing；

Subelement 3052 is compensated, for matching fraction to be added into migration fraction, obtains new matching fraction；

Judgment sub-unit 3053, for obtaining the authority credentials of voice to be verified according to new matching fraction, judge to be verified Whether the authority credentials of voice is more than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then touching Hair performs subelement 3054；

Subelement 3054 is performed, for performing operation corresponding to preset wake-up word corresponding with voice to be verified.

Preferably, comparing unit 305 also includes：Prompt subelement 3055；

Judgment sub-unit 3053 is specifically used for the authority credentials that voice to be verified is obtained according to new matching fraction, judges to be tested Whether the authority credentials for demonstrate,proving voice is more than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then Triggering performs subelement 3054, if it is not, then triggering prompting subelement 3055；

Subelement 3055 is prompted, the prompting for sending permission deficiency.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, the corresponding process in preceding method embodiment is may be referred to, will not be repeated here.

In several embodiments provided herein, it should be understood that disclosed apparatus and method, it can be passed through Its mode is realized.For example, device embodiment described above is only schematical, for example, the division of the unit, only Only a kind of division of logic function, there can be other dividing mode when actually realizing, such as multiple units or component can be tied Another system is closed or is desirably integrated into, or some features can be ignored, or do not perform.It is another, it is shown or discussed Mutual coupling or direct-coupling or communication connection can be the INDIRECT COUPLINGs or logical by some interfaces, device or unit Letter connection, can be electrical, mechanical or other forms.

The unit illustrated as separating component can be or may not be physically separate, show as unit The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On NE.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list Member can both be realized in the form of hardware, can also be realized in the form of SFU software functional unit.

If the integrated unit is realized in the form of SFU software functional unit and is used as independent production marketing or use When, it can be stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially The part to be contributed in other words to prior art or all or part of the technical scheme can be in the form of software products Embody, the computer software product is stored in a storage medium, including some instructions are causing a computer Equipment (can be personal computer, server, or network equipment etc.) performs the complete of each embodiment methods described of the present invention Portion or part steps.And foregoing storage medium includes：USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey The medium of sequence code.

Described above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although with reference to before Embodiment is stated the present invention is described in detail, it will be understood by those within the art that：It still can be to preceding State the technical scheme described in each embodiment to modify, or equivalent substitution is carried out to which part technical characteristic；And these Modification is replaced, and the essence of appropriate technical solution is departed from the spirit and scope of various embodiments of the present invention technical scheme.

Claims

A kind of 1. voice awakening method of combination Application on Voiceprint Recognition, it is characterised in that including：

S1：Receive voice to be verified and carry out feature extraction, obtain the MFCC features of voice to be verified；

S2：The MFCC features of voice to be verified in predetermined period are cached；

S3：Whether the content that voice to be verified is judged according to the MFCC features of the voice to be verified of caching is preset wake-up word, If so, then perform step S4；

S4：The MFCC features of the voice to be verified of caching are inputted in preset deep neural network model, obtain language to be verified The i-vector vectors of sound；

S5：The preset i-vector vectors of the i-vector vector sums of voice to be verified are compared, drawn according to comparison The authority credentials that fraction obtains voice to be verified is matched, judges whether the authority credentials of voice to be verified is more than or equal to voice to be verified Authority credentials corresponding to corresponding preset wake-up word, if so, then perform with voice to be verified corresponding to preset wake-up word it is corresponding Operation.
2. the voice awakening method of a kind of combination Application on Voiceprint Recognition according to claim 1, it is characterised in that step S4 is specific Including：

S41：The MFCC features of the voice to be verified of caching are cascaded；

S42：MFCC features after cascade are inputted in preset deep neural network model, obtain the i- of voice to be verified Vector vectors, and preset deep neural network model is updated to by the MFCC features after cascade by new preset depth Neural network model.
3. the voice awakening method of a kind of combination Application on Voiceprint Recognition according to claim 1, it is characterised in that step S5 is specific Including：

S51：The preset i-vector vectors of the i-vector vector sums of voice to be verified are subjected to positive naturalization processing, by positive naturalization The i-vector vectors that the i-vector vector sums of voice to be verified after processing are preset analyze mould by the linear distinction of outline Type is compared, and gets the matching fraction for comparing and drawing；

S52：Matching fraction is added into migration fraction, obtains new matching fraction；

S53：The authority credentials of voice to be verified is obtained according to new matching fraction, judges whether the authority credentials of voice to be verified is more than Or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then performing step S54；

S54：Perform and operated corresponding to preset wake-up word corresponding with voice to be verified.
4. the voice awakening method of a kind of combination Application on Voiceprint Recognition according to claim 3, it is characterised in that step S5 is also wrapped Include：Step S55；

Step S53 is specifically included：The authority credentials of voice to be verified is obtained according to new matching fraction, judges the power of voice to be verified Whether limit value is more than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then performing step S54, if it is not, then performing step S55；

S55：The prompting of sending permission deficiency.
A kind of 5. voice Rouser of combination Application on Voiceprint Recognition, it is characterised in that including：

Feature unit, for receiving voice to be verified and carrying out feature extraction, obtain the MFCC features of voice to be verified；

Buffer unit, for being cached to the MFCC features of the voice to be verified in predetermined period；

Wakeup unit, the MFCC features for the voice to be verified according to caching judge whether the content of voice to be verified is preset Wake-up word, if so, then performing step S4；

Vector location, for the MFCC features of the voice to be verified of caching to be inputted in preset deep neural network model, obtain Take the i-vector vectors of voice to be verified；

Comparing unit, the i-vector vector preset for the i-vector vector sums by voice to be verified are compared, according to The authority credentials that the matching fraction drawn obtains voice to be verified is compared, judges whether the authority credentials of voice to be verified is more than or equal to Authority credentials corresponding to preset wake-up word corresponding to voice to be verified, if so, then performing corresponding with voice to be verified preset Wake up and operated corresponding to word.
6. the voice Rouser of a kind of combination Application on Voiceprint Recognition according to claim 5, it is characterised in that vector location has Body includes：

Subelement is cascaded, for the MFCC features of the voice to be verified of caching to be cascaded；

Subelement is obtained, for the MFCC features after cascade to be inputted in preset deep neural network model, is obtained to be verified The i-vector vectors of voice, and by the MFCC features after cascade be updated to preset deep neural network model new pre- The deep neural network model put.
7. the voice Rouser of a kind of combination Application on Voiceprint Recognition according to claim 5, it is characterised in that comparing unit has Body includes：

Coupling subelement, the i-vector preset for the i-vector vector sums by voice to be verified vectors are carried out at positive naturalization Reason, the i-vector vectors that the i-vector vector sums of the voice to be verified after positive naturalization is handled are preset pass through outline linear zone Point property analysis model is compared, and gets the matching fraction for comparing and drawing；

Subelement is compensated, for matching fraction to be added into migration fraction, obtains new matching fraction；

Judgment sub-unit, for obtaining the authority credentials of voice to be verified according to new matching fraction, judge the power of voice to be verified Whether limit value is more than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then triggering performs son Unit；

Subelement is performed, for performing operation corresponding to preset wake-up word corresponding with voice to be verified.
8. the voice Rouser of a kind of combination Application on Voiceprint Recognition according to claim 7, it is characterised in that comparing unit is also Including：Prompt subelement；

Judgment sub-unit is specifically used for the authority credentials that voice to be verified is obtained according to new matching fraction, judges voice to be verified Whether authority credentials is more than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then triggering performs Subelement, if it is not, then triggering prompting subelement；

Subelement is prompted, the prompting for sending permission deficiency.