CN102647521B

CN102647521B - Method for removing lock of mobile phone screen based on short voice command and voice-print technology

Info

Publication number: CN102647521B
Application number: CN2012100970831A
Authority: CN
Inventors: 刘德建; 关胤; 余志鹏; 吴拥民
Original assignee: FUZHOU BOYUAN WIRELESS NETWORK TECHNOLOGY Co Ltd
Current assignee: Baidu com Times Technology Beijing Co Ltd
Priority date: 2012-04-05
Filing date: 2012-04-05
Publication date: 2013-10-09
Anticipated expiration: 2032-04-05
Also published as: CN102647521A

Abstract

The invention provides a method for removing lock of mobile phone screen based on a short voice command and a voice-print technology. The method comprises the steps that: in a preset stage, a user inputs preset voice passwords and executes quick fourier change so as to determine a pass threshold; and in an unlocking stage, the user inputs unlocked voice passwords and executes quick fourier changeso as to compute a difference value between an unlocked voice password frequency domain signal and a preset voice password frequency domain signal, judges whether the mobile phone is unlocked by comparing whether the difference value is smaller than the pass threshold, and unlocks the locking state of the mobile phone. the method is convenient and fast and ensures the use safety of the mobile phone; rules on the computing of the difference value are conducted on the basis, technologies containing framing and windowing, MFCC (Mel-Frequency Cepstral Coefficient) computing and vector quantization processing are introduced, so that the sound characteristics of the user can be accurately extracted and compared, and the user experience on portability and safety is improved.

Description

Method based on voice short command and the screen locking of vocal print technology releasing mobile phone

[technical field]

The present invention relates to a kind of method based on voice short command and the screen locking of vocal print technology releasing mobile phone.

[background technology]

Existing mobile phone mostly is by touch action, and illumination judges that technology such as cryptoguard reach the purpose of removing the screen locking state.Adopt touch action, technology such as illumination judgement are removed the mobile phone screen locking, and mobile phone does not have fail safe; anyone can the release mobile phone; and adopt the mode of cryptoguard to remove the mobile phone screen locking, and use mobile phone though can prevent other unauthorized users, convenient and swift inadequately during operation.

Publication number is 102148899A, open day is the patent of invention of 2011-8-10, be that the waveform (time-domain signal just) of user input instruction waveform and the existing release sound instruction of cell phone system is compared, judge whether to coincide and determine whether release, by the comparison waveform obtain coincideing or for 80%-100% identical, this is impossible realize, because same individual tells about an identical word or in short constantly in difference, its different wave shape is also very big, therefore, this invention does not possess exploitativeness.

[summary of the invention]

The technical problem to be solved in the present invention, be to provide a kind of method based on voice short command and the screen locking of vocal print technology releasing mobile phone, not only convenient and swift but also guaranteed the fail safe that mobile phone uses, on this basis the calculating of difference value is stipulated, introduce and divide the frame windowing, MFCC coefficient calculations and vector quantization treatment technology can extract and compare user's sound property more accurately, and the user who has improved in convenience and the fail safe experiences.

The present invention solves above-mentioned technical problem by following two kinds of technical schemes:

Scheme one: a kind of method based on voice short command and the screen locking of vocal print technology releasing mobile phone, comprise preset stage and release stage, described preset stage comprises the steps:

The voice password is preset in step 1, user's input, and the preservation form of described voice password in mobile phone is time-domain signal;

Step 2, be that the voice data that time-domain signal described presets the voice password is carried out fast fourier transform with the preservation form, the described voice data that presets the voice password is transformed into the frequency-region signal that presets the voice password;

Step 3, in the user mobile phone system, provide one the acquiescence passing threshold or be set by the user a passing threshold;

The described release stage comprises the steps:

Step 4, user import release voice password, and the preservation form of described voice password in mobile phone is time-domain signal;

Step 5, be that the voice data of the described release voice password of time-domain signal is carried out fast Fourier transform with the preservation form, the voice data of described release voice password is transformed into the frequency-region signal spectrum of release voice password;

Step 6, the described release voice password frequency-region signal of calculating and the described difference value that presets voice password frequency-region signal;

Step 7, whether judge described difference value less than described passing threshold, if, then remove mobile phone screen locking state, if not, then point out the release failure.

Further, described difference value obtains by asking Euclidean distance.

Scheme two: a kind of method based on voice short command and the screen locking of vocal print technology releasing mobile phone, comprise preset stage and release stage, described preset stage comprises the steps:

The voice password is preset in step 10, user input, and described to preset the preservation form of voice password in mobile phone be time-domain signal;

Step 11, be that the described voice data that presets the voice password of time-domain signal carries out the windowing process of branch frame with the preservation form, and calculate the number of frames N that presets the voice password;

Step 12, each frame is preset the voice password carry out fast Fourier transform, each frame presets the frequency-region signal that voice password correspondent transform becomes to preset the voice password;

The triangular window filter of linear distribution on step 13, X Mel frequency marking of usefulness, to the frequency-region signal filtering successively of respectively presetting the voice password, after the filtering, each frequency-region signal that presets the voice password all obtains X corresponding energy value; Described X is natural number, 1≤X≤128;

Step 14, to preceding Y ₁Frame presets X the energy value that each frame in the voice password presets voice password correspondence and asks the noise energy average that presets the voice password, described Y ₁Be natural number, 1≤Y ₁≤ N; The described process that presets voice password noise energy average of asking is specially: X energy value to the Y that first frame is preset voice password correspondence ₁Frame presets X energy value of voice password correspondence and asks arithmetic mean respectively, obtains presetting X noise energy average of voice password, and the solution procedure of described arithmetic mean is specially: first energy value to the Y that namely earlier first frame is preset voice password correspondence ₁Frame presets first energy value of voice password correspondence and asks arithmetic mean, obtains presetting first noise energy average of voice password, then and the like ask arithmetic mean, obtain presetting X noise energy average of voice password after finishing altogether;

Step 15, at remaining N-Y ₁Frame presets in the voice password, and X the energy value that each frame presets voice password correspondence deducts X the noise energy average that presets the voice password respectively accordingly, and each frame presets the voice password and all obtains X corresponding with it noise reduction energy value; Described N-Y ₁Refer to the voice password that presets of N frame is removed be used to first frame to the Y that asks the noise energy average ₁Frame presets the voice password;

Step 16, to the residue N-Y ₁Frame presets X the noise reduction energy value that each frame in the voice password presets voice password correspondence and carries out discrete cosine transform, obtains to preset the N-Y of voice password altogether ₁Individual Z dimension MFCC coefficient; Described Z is natural number, 1≤Z≤128;

Step 17, the N-Y that presets the voice password to obtaining ₁Individual Z dimension MFCC coefficient carries out vector quantization, and it is K that the length that quantizes code book is set, and K is natural number, and 1≤K≤128; Then obtain one and quantize code book, this quantizes code book and is made up of K Z dimension MFCC;

Step 18, the user mobile phone system provide one the acquiescence passing threshold or be set by the user a passing threshold;

The described release stage comprises the steps:

Step 20, user import release voice password; The preservation form of described release voice password in mobile phone is time-domain signal;

Step 21, be that the voice data of the described release voice password of time-domain signal carries out the windowing process of branch frame with the preservation form, and calculate the number of frames M of release voice password;

Step 22, fast Fourier transform carried out in each frame release voice password, each frame release voice password correspondent transform becomes the frequency-region signal of a release voice password;

Step 23, with the frequency-region signal successively filtering of described triangular window filter to each release voice password, after the filtering, the frequency-region signal of each release voice password all obtains X energy value of corresponding release voice password;

Step 24, to preceding Y ₂X energy value of each frame release voice password correspondence found the solution the noise energy average of lock voice password in the frame release voice password, and the described process of finding the solution lock voice password noise energy average is specially: with energy value to the Y of X release voice password of the first frame release voice password correspondence ₂X energy value of frame release voice password correspondence asked arithmetic mean respectively, obtains X noise energy average of release voice password, and the solution procedure of described arithmetic mean is specially: namely elder generation is to first energy value to the Y of the first frame release voice password correspondence ₂First energy value of frame release voice password correspondence is asked arithmetic mean, obtains first noise energy average of release voice password, then and the like ask arithmetic mean, obtain the noise energy average of X release voice password after finishing altogether;

Step 25, at remaining M-Y ₂In the frame release voice password, X average energy value of each frame release voice password correspondence deducts X noise energy average of release voice password respectively accordingly, and each frame release voice password all obtains X corresponding with it noise reduction energy value; Described M-Y ₂Refer to the release voice password of M frame is removed be used to first frame to the Y that asks the noise energy average ₂Frame release voice password;

Step 26, to the residue M-Y ₂X noise reduction energy value of each frame release voice password correspondence carries out discrete cosine transform in the frame release voice password, obtains the M-Y of release voice password altogether ₂Individual Z dimension MFCC coefficient;

Step 27, each Z dimension MFCC coefficient of release voice password is compared one by one with the described quantification code book that presets the voice password respectively, release voice password is M-Y altogether ₂Individual Z dimension MFCC coefficient is then compared M-Y ₂Wheel is made up of K Z dimension MFCC because this quantizes code book, and each takes turns comparison, all obtains K distance value, and chooses lowest distance value wherein, and namely each is taken turns comparison and obtains a lowest distance value, has all compared, and obtains M-Y altogether ₂Individual lowest distance value is with M-Y ₂Individual lowest distance value summation and divided by M-Y ₂, obtain average minimum range; Described comparison is for asking Euclidean distance;

Step 28, whether judge described average minimum range less than described passing threshold, if, then remove mobile phone screen locking state, if not, then point out the release failure.

Further, in the step 11, by formula N=(L ₁-20)/10+1 rounds downwards and tries to achieve the described number of frames N that presets the voice password, wherein, L in the formula ₁Represent the described audio frequency duration that presets the voice password, L ₁Unit be millisecond, 20 in formula expression frame length is 20 milliseconds, the expression of 10 in formula frame is superposed to 10 milliseconds.

Further, in the step 21, by formula M=(L ₂-20)/10+1 rounds the number of frames M that tries to achieve described release voice password downwards, wherein, L in the formula ₂The audio frequency duration of representing described release voice password, L ₂Unit be millisecond, 20 in formula expression frame length is 20 milliseconds, the expression of 10 in formula frame is superposed to 10 milliseconds.

Further, import describedly when presetting voice password and described release voice password, the described signal sampling rate that presets voice password and described release voice password is 16000Hz.

Further, number 24≤X≤39 of the Mel frequency marking of described triangular window filter.

Further, described triangular window filter is the triangular window filter with linear distribution on 24 Mel frequency markings, i.e. X=24, and the centre frequency of described triangular window filter is respectively: 100,200,300,400,500,600,700,800,900,1000,1149,1320,1516,1741,2000,2297,2639,3031,3482,4000,4595,5278,6063,6964, bandwidth is: 100,100,100,100,100,100,100,100,100,124,160,184,211,242,278,320,367,422,484,556,639,734,843,969, above numerical value unit is Hz.

Further, described triangular window filter is the triangular window filter with linear distribution on 39 Mel frequency markings, i.e. X=39, and the centre frequency of described triangular window filter is respectively: 50,100,150,200,260,320,390,460,530,610,700,790,890,990,1100,1210,1340,1480,1610,1770,1930,2100,2280,2480,2680,2900,3140,3380,3650,3930,4230,4560,4900,5260,5650,6060,6500,6970,7470, bandwidth is: 100,100,100,120,127,127,148,148,148,169,190,190,233,233,254,254,296,296,275,339,339,360,381,424,424,466,508,508,572,593,636,699,720,763,826,869,932,996,1060, above numerical value unit is Hz.

Further, the computing formula of described discrete cosine transform is:

Σ_{j = 1}^{X} En (j) \cos [\frac{π (i + 1) (j - 0.5)}{24}],

Wherein En (j) represents j noise reduction energy value 1≤j≤X, 1≤i≤Z, and i, j are natural number.

Further, the windowing process in step 11 and the step 21 is and adds the Hamming window processing.

The present invention has following advantage: remove mobile phone screen locking state by voice short command and voiceprint authentication technology, not only convenient and swift but also guaranteed the fail safe that mobile phone uses, simultaneously, introduce and divide the frame windowing, MFCC coefficient calculations and vector quantization treatment technology, user's sound property can be extracted and compare more accurately, and the user who has improved in convenience and the fail safe experiences.

[embodiment]

The present invention program one specific embodiment is as follows:

A kind of method based on voice short command and the screen locking of vocal print technology releasing mobile phone comprises preset stage and release stage, and described preset stage comprises the steps:

The described release stage comprises the steps:

Step 6, the described release voice password frequency-region signal of calculating and the described difference value that presets voice password frequency-region signal; Described difference value obtains by asking Euclidean distance;

The present invention program two first embodiment is as follows:

A kind of method based on voice short command and the screen locking of vocal print technology releasing mobile phone, it is characterized in that: comprise preset stage and release stage, described preset stage comprises the steps:

The voice password is preset in step 10, user input, and described to preset the preservation form of voice password in mobile phone be time-domain signal; The described signal sampling rate that presets the voice password is 16000Hz;

Step 11, be that the described voice data that presets the voice password of time-domain signal carries out the windowing process of branch frame with the preservation form, and calculate the number of frames N that presets the voice password; In the present embodiment by formula N=(L ₁-20)/10+1 rounds downwards and tries to achieve the described number of frames N that presets the voice password, wherein, L in the formula ₁Represent the described audio frequency duration that presets the voice password, the expression of 20 in formula frame length is 20 milliseconds, and the expression of 10 in formula frame is superposed to 10 milliseconds; Described windowing process is handled for adding Hamming window;

The triangular window filter of linear distribution on step 13, X Mel frequency marking of usefulness, to the frequency-region signal filtering successively of respectively presetting the voice password, after the filtering, each frequency-region signal that presets the voice password all obtains X corresponding energy value; Described X is natural number, 1≤X≤128.More preferably, number 24≤X≤39 of the Mel frequency marking of described triangular window filter, select for use the triangular window filter of this X scope to obtain reasonably to compromise at operation efficiency and between to the descriptive power of characteristics of speech sounds, obviously, filter quantity is more big, the value that is X is more big, and is just more meticulous to the description of characteristics of speech sounds, but operation efficiency can reduce.

Step 16, to the residue N-Y ₁Frame presets X the noise reduction energy value that each frame in the voice password presets voice password correspondence and carries out discrete cosine transform, obtains to preset the N-Y of voice password altogether ₁Individual Z dimension MFCC coefficient; Described Z is natural number, 1≤Z≤128; The computing formula of described discrete cosine transform is:

Wherein, when carrying out discrete cosine transform to presetting the voice password, j noise reduction energy value that presets the voice password of En (j) expression, 1≤j≤X, 1≤i≤Z, i, j are natural number;

Step 17, the N-Y that presets the voice password to obtaining ₁Individual Z dimension MFCC coefficient carries out vector quantization, and it is K that the length that quantizes code book is set, and K is the natural number more than or equal to 1, and 1≤K≤128; Then obtain one and quantize code book, this quantizes code book and is made up of K Z dimension MFCC;

The described release stage comprises the steps:

Step 20, user import release voice password, and the preservation form of described release voice password in mobile phone is time-domain signal; Signal sampling rate to described release voice password is 16000Hz;

Step 21, be that the voice data of the described release voice password of time-domain signal carries out the windowing process of branch frame with the preservation form, and calculate the number of frames M of release voice password, in the present embodiment by formula M=(L ₂-20)/10+1 rounds the number of frames M that tries to achieve described release voice password downwards, wherein, L in the formula ₂The audio frequency duration of representing described release voice password, the expression of 20 in formula frame length is 20 milliseconds, the expression of 10 in formula frame is superposed to 10 milliseconds; Described windowing process is handled for adding Hamming window;

Step 26, to the residue M-Y ₂X noise reduction energy value of each frame release voice password correspondence carries out discrete cosine transform in the frame release voice password, obtains the M-Y of release voice password altogether ₂Individual Z dimension MFCC coefficient; The computing formula of described discrete cosine transform is with to preset the computing formula that discrete cosine that the voice password adopts changes identical, namely

When release voice password is carried out discrete cosine transform, the noise reduction energy value of j release voice password of En (j) expression, 1≤j≤X, 1≤i≤Z, i, j are natural number;

When the signal sampling rate that presets voice password and release voice password of the present invention all adopts 16000Hz; Can reduce the amount of audio data that needs processing under the prerequisite that does not influence speech quality like this, also be simultaneously the sample frequency that most of audio input device are supported.

The present invention program two second embodiment is as follows:

In the present embodiment, get X=24, Y ₁=3, Y ₂=3, Z=13, K=5

Step 13, with the triangular window filter of linear distribution on 24 Mel frequency markings, to the frequency-region signal filtering successively of respectively presetting the voice password, after the filtering, each frequency-region signal that presets the voice password all obtains 24 corresponding energy values; The triangular window filter of linear distribution on described 24 Mel frequency markings, its centre frequency is respectively: 100,200,300,400,500,600,700,800,900,1000,1149,1320,1516,1741,2000,2297,2639,3031,3482,4000,4595,5278,6063,6964, bandwidth is: 100,100,100,100,100,100,100,100,100,124,160,184,211,242,278,320,367,422,484,556,639,734,843,969, above numerical value unit is Hz;

Step 14, preceding 3 frames are preset 24 energy values that each frame in the voice password presets voice password correspondence ask the noise energy average that presets the voice password, the described process that presets voice password noise energy average of asking is specially: 24 energy value to the 3 frames that first frame is preset voice password correspondence preset 24 energy values of voice password correspondence and ask arithmetic mean respectively, obtain presetting 24 noise energy averages of voice password, the solution procedure of described arithmetic mean is specially: namely first energy value to the 3 frames that earlier first frame preset voice password correspondence first energy value of presetting voice password correspondence is asked arithmetic mean, obtain presetting first noise energy average of voice password, then and the like ask arithmetic mean, obtain presetting 24 noise energy averages of voice password after finishing altogether;

Step 15, preset in the voice password at remaining N-3 frame, 24 energy values that each frame presets voice password correspondence deduct 24 noise energy averages that preset the voice password respectively accordingly, and each frame presets the voice password and all obtains 24 corresponding with it noise reduction energy values; Described N-3 refers to the voice password that presets of N frame is removed be used to first frame to the, 3 frames of asking the noise energy average and presets the voice password;

Step 16, residue N-3 frame is preset 24 noise reduction energy values that each frame in the voice password presets voice password correspondence carry out discrete cosine transform, obtain to preset N-3 13 dimension MFCC coefficients of voice password altogether; The computing formula of described discrete cosine transform is:

Wherein, when carrying out discrete cosine transform to presetting the voice password, j noise reduction energy value that presets the voice password of En (j) expression, 1≤j≤24,1≤i≤13, i, j are natural number; Specify as follows:

Preset to appoint the voice password from residue N-3 frame and get 24 noise reduction energy values that a frame presets voice password correspondence, get i=1 earlier, try to achieve this frame and preset the first dimension MFCC coefficient of voice password, try to achieve this frame when getting i=13 and preset the 13rd dimension MFCC coefficient of voice password, analogize in proper order, the value of i is taken at 13 o'clock from 1, obtains the 13 dimension MFCC coefficients that this frame presets voice password correspondence altogether; Each frame that residue M-3 frame is preset the voice password presets the voice password all by after the described diffusing cosine transform computing formula calculating, obtains to preset N-3 13 dimension MFCC coefficients of voice password;

Step 17, the N-3 that presets the voice password 13 dimension MFCC coefficients that obtain are carried out vector quantization, it is 5 that the length that quantizes code book is set, and then obtains one and quantizes code book, and this quantizes code book and is made up of 5 13 dimension MFCC; Excessive quantification code book length can cause the increase of computing time, quantize the code book curtailment and then be not enough to portray the phonetic feature that presets password, selection quantification code book length K=5 o'clock is not only lacked computing time, also can effectively portray the phonetic feature that presets the voice password simultaneously;

The described release stage comprises the steps:

Step 23, with the frequency-region signal successively filtering of described triangular window filter to each release voice password, after the filtering, the frequency-region signal of each release voice password all obtains 24 energy values of corresponding release voice password;

Step 24,24 energy values of each frame release voice password correspondence in the preceding 3 frame release voice passwords are found the solution the noise energy average of lock voice password, the described process of finding the solution lock voice password noise energy average is specially: 24 energy values of energy value to the 3 frame release voice password correspondences of 24 release voice passwords of the first frame release voice password correspondence are asked arithmetic mean respectively, obtain 24 noise energy averages of release voice password, the solution procedure of described arithmetic mean is specially: namely elder generation asks arithmetic mean to first energy value of first energy value to the 3 frame release voice password correspondences of the first frame release voice password correspondence, obtain first noise energy average of release voice password, then and the like ask arithmetic mean, obtain the noise energy average of 24 release voice passwords after finishing altogether;

Step 25, in remaining M-3 frame release voice password, 24 average energy value of each frame release voice password correspondence deduct 24 noise energy averages of release voice password respectively accordingly, and each frame release voice password all obtains 24 corresponding with it noise reduction energy values; Described M-3 refers to the release voice password of M frame is removed be used to first frame to the, the 3 frame release voice passwords of asking the noise energy average;

Step 26,24 noise reduction energy values of each frame release voice password correspondence in the residue M-3 frame release voice password are carried out discrete cosine transform, obtain M-3 13 dimension MFCC coefficients of release voice password altogether; The computing formula of described discrete cosine transform is with to preset the computing formula that discrete cosine that the voice password adopts changes identical, namely When release voice password is carried out discrete cosine transform, the noise reduction energy value of j release voice password of En (j) expression, 1≤j≤24,1≤i≤13, i, j are natural number;

Step 27, each 13 dimension MFCC coefficient of release voice password are compared one by one with the described quantification code book that presets the voice password respectively, release voice password is M-3 13 dimension MFCC coefficients altogether, then compare the M-3 wheel, because quantizing code book, this is formed by 5 13 dimension MFCC, each takes turns comparison, all obtain 5 distance values, and choose wherein lowest distance value, be that each takes turns lowest distance value of comparison acquisition, all compared, obtain M-3 lowest distance value altogether, with M-3 lowest distance value summation and divided by M-3, obtain average minimum range; Described comparison is for asking Euclidean distance;

Now illustrate comparison process: suppose K=5, M-3=6, then first round comparison is: select one earlier from 6 13 dimension MFCC coefficients of release voice password, and with its with preset 5 the 13 dimension MFCC coefficients that the voice password quantizes in the code book and ask Euclidean distance respectively, then produce 5 distance values, choose distance value minimum in these 5 distance values as the lowest distance value of first round comparison; Second taking turns comparison and be then: from 13 dimension MFCC coefficients of remaining 5 release voice passwords carrying out comparison, select one again, and with its with preset 5 the 13 dimension MFCC coefficients that the voice password quantizes in the code book and ask Euclidean distance respectively, produce 5 distance values, choose distance value minimum in these 5 distance values as second lowest distance value of taking turns comparison; By that analogy, 13 dimension MFCC coefficients of 6 release voice passwords are arranged, then carry out 6 and take turns comparison; Each is taken turns comparison and all obtains 5 distance values, and chooses lowest distance value wherein, has all compared, and obtains 6 lowest distance value altogether;

In the present invention, described triangular window filter is the triangular window filter that also can select linear distribution on 39 Mel frequency markings for use, i.e. X=39, and the centre frequency of described triangular window filter is respectively: 50,100,150,200,260,320,390,460,530,610,700,790,890,990,1100,1210,1340,1480,1610,1770,1930,2100,2280,2480,2680,2900,3140,3380,3650,3930,4230,4560,4900,5260,5650,6060,6500,6970,7470, bandwidth is: 100,100,100,120,127,127,148,148,148,169,190,190,233,233,254,254,296,296,275,339,339,360,381,424,424,466,508,508,572,593,636,699,720,763,826,869,932,996,1060, above numerical value unit is Hz.During the triangular window filter of linear distribution, its principle is all identical with embodiment two with the present invention program two embodiment one on selecting 39 Mel frequency markings for use.

Remove mobile phone screen locking state by voice short command and voiceprint authentication technology, not only convenient and swift but also guaranteed the fail safe that mobile phone uses, simultaneously, introduce and divide the frame windowing, MFCC coefficient calculations and vector quantization treatment technology, user's sound property can be extracted and compare more accurately, and the user who has improved in convenience and the fail safe experiences.

Though more than described the specific embodiment of the present invention; but being familiar with those skilled in the art is to be understood that; our described specific embodiment is illustrative; rather than for the restriction to scope of the present invention; those of ordinary skill in the art are in modification and the variation of the equivalence of doing according to spirit of the present invention, all should be encompassed in the scope that claim of the present invention protects.

Claims

1. remove the method for mobile phone screen locking based on voice short command and vocal print technology for one kind, it is characterized in that: comprise preset stage and release stage, described preset stage comprises the steps:

The described release stage comprises the steps:

2. according to the method based on voice short command and the screen locking of vocal print technology releasing mobile phone described in the claim 1, it is characterized in that: in the step 11, by formula N=(L ₁-20)/10+1 rounds downwards and tries to achieve the described number of frames N that presets the voice password, wherein, L in the formula ₁Represent the described audio frequency duration that presets the voice password, L ₁Unit be millisecond, 20 in formula expression frame length is 20 milliseconds, the expression of 10 in formula frame is superposed to 10 milliseconds.

3. according to the method based on voice short command and the screen locking of vocal print technology releasing mobile phone described in the claim 1, it is characterized in that: in the step 21, by formula M=(L ₂-20)/10+1 rounds the number of frames M that tries to achieve described release voice password downwards, wherein, L in the formula ₂The audio frequency duration of representing described release voice password, L ₂Unit be millisecond, 20 in formula expression frame length is 20 milliseconds, the expression of 10 in formula frame is superposed to 10 milliseconds.

4. according to the method based on voice short command and the screen locking of vocal print technology releasing mobile phone described in the claim 1, it is characterized in that: import describedly when presetting voice password and described release voice password, the described signal sampling rate that presets voice password and described release voice password is 16000Hz.

5. according to the method based on voice short command and the screen locking of vocal print technology releasing mobile phone described in the claim 1, it is characterized in that: number 24≤X≤39 of the Mel frequency marking of described triangular window filter.

6. according to removing the method for mobile phone screen locking based on voice short command and vocal print technology described in the claim 5, it is characterized in that: described triangular window filter is the triangular window filter with linear distribution on 24 Mel frequency markings, i.e. X=24, and the centre frequency of described triangular window filter is respectively: 100,200,300,400,500,600,700,800,900,1000,1149,1320,1516,1741,2000,2297,2639,3031,3482,4000,4595,5278,6063,6964, bandwidth is: 100,100,100,100,100,100,100,100,100,124,160,184,211,242,278,320,367,422,484,556,639,734,843,969, above numerical value unit is Hz.

7. according to removing the method for mobile phone screen locking based on voice short command and vocal print technology described in the claim 5, it is characterized in that: described triangular window filter is the triangular window filter with linear distribution on 39 Mel frequency markings, i.e. X=39, and the centre frequency of described triangular window filter is respectively: 50,100,150,200,260,320,390,460,530,610,700,790,890,990,1100,1210,1340,1480,1610,1770,1930,2100,2280,2480,2680,2900,3140,3380,3650,3930,4230,4560,4900,5260,5650,6060,6500,6970,7470, bandwidth is: 100,100,100,120,127,127,148,148,148,169,190,190,233,233,254,254,296,296,275,339,339,360,381,424,424,466,508,508,572,593,636,699,720,763,826,869,932,996,1060, above numerical value unit is Hz.

8. according to the method based on voice short command and the screen locking of vocal print technology releasing mobile phone described in the claim 1, it is characterized in that: the computing formula of described discrete cosine transform is:

Σ_{j = 1}^{X} En (j) \cos [\frac{π (i + 1) (j - 0.5)}{24}],

Wherein En (j) represents j noise reduction energy value, 1≤j≤X, and 1≤i≤Z, i, j are natural number.

9. according to the method based on voice short command and the screen locking of vocal print technology releasing mobile phone described in the claim 1, it is characterized in that: the windowing process in step 11 and the step 21 is and adds the Hamming window processing.