CN102647521B - Method for removing lock of mobile phone screen based on short voice command and voice-print technology - Google Patents

Method for removing lock of mobile phone screen based on short voice command and voice-print technology Download PDF

Info

Publication number
CN102647521B
CN102647521B CN2012100970831A CN201210097083A CN102647521B CN 102647521 B CN102647521 B CN 102647521B CN 2012100970831 A CN2012100970831 A CN 2012100970831A CN 201210097083 A CN201210097083 A CN 201210097083A CN 102647521 B CN102647521 B CN 102647521B
Authority
CN
China
Prior art keywords
voice password
frame
release
voice
presets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2012100970831A
Other languages
Chinese (zh)
Other versions
CN102647521A (en
Inventor
刘德建
关胤
余志鹏
吴拥民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu com Times Technology Beijing Co Ltd
Original Assignee
FUZHOU BOYUAN WIRELESS NETWORK TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by FUZHOU BOYUAN WIRELESS NETWORK TECHNOLOGY Co Ltd filed Critical FUZHOU BOYUAN WIRELESS NETWORK TECHNOLOGY Co Ltd
Priority to CN2012100970831A priority Critical patent/CN102647521B/en
Publication of CN102647521A publication Critical patent/CN102647521A/en
Application granted granted Critical
Publication of CN102647521B publication Critical patent/CN102647521B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Telephone Function (AREA)

Abstract

The invention provides a method for removing lock of mobile phone screen based on a short voice command and a voice-print technology. The method comprises the steps that: in a preset stage, a user inputs preset voice passwords and executes quick fourier change so as to determine a pass threshold; and in an unlocking stage, the user inputs unlocked voice passwords and executes quick fourier changeso as to compute a difference value between an unlocked voice password frequency domain signal and a preset voice password frequency domain signal, judges whether the mobile phone is unlocked by comparing whether the difference value is smaller than the pass threshold, and unlocks the locking state of the mobile phone. the method is convenient and fast and ensures the use safety of the mobile phone; rules on the computing of the difference value are conducted on the basis, technologies containing framing and windowing, MFCC (Mel-Frequency Cepstral Coefficient) computing and vector quantization processing are introduced, so that the sound characteristics of the user can be accurately extracted and compared, and the user experience on portability and safety is improved.

Description

Method based on voice short command and the screen locking of vocal print technology releasing mobile phone
[technical field]
The present invention relates to a kind of method based on voice short command and the screen locking of vocal print technology releasing mobile phone.
[background technology]
Existing mobile phone mostly is by touch action, and illumination judges that technology such as cryptoguard reach the purpose of removing the screen locking state.Adopt touch action, technology such as illumination judgement are removed the mobile phone screen locking, and mobile phone does not have fail safe; anyone can the release mobile phone; and adopt the mode of cryptoguard to remove the mobile phone screen locking, and use mobile phone though can prevent other unauthorized users, convenient and swift inadequately during operation.
Publication number is 102148899A, open day is the patent of invention of 2011-8-10, be that the waveform (time-domain signal just) of user input instruction waveform and the existing release sound instruction of cell phone system is compared, judge whether to coincide and determine whether release, by the comparison waveform obtain coincideing or for 80%-100% identical, this is impossible realize, because same individual tells about an identical word or in short constantly in difference, its different wave shape is also very big, therefore, this invention does not possess exploitativeness.
[summary of the invention]
The technical problem to be solved in the present invention, be to provide a kind of method based on voice short command and the screen locking of vocal print technology releasing mobile phone, not only convenient and swift but also guaranteed the fail safe that mobile phone uses, on this basis the calculating of difference value is stipulated, introduce and divide the frame windowing, MFCC coefficient calculations and vector quantization treatment technology can extract and compare user's sound property more accurately, and the user who has improved in convenience and the fail safe experiences.
The present invention solves above-mentioned technical problem by following two kinds of technical schemes:
Scheme one: a kind of method based on voice short command and the screen locking of vocal print technology releasing mobile phone, comprise preset stage and release stage, described preset stage comprises the steps:
The voice password is preset in step 1, user's input, and the preservation form of described voice password in mobile phone is time-domain signal;
Step 2, be that the voice data that time-domain signal described presets the voice password is carried out fast fourier transform with the preservation form, the described voice data that presets the voice password is transformed into the frequency-region signal that presets the voice password;
Step 3, in the user mobile phone system, provide one the acquiescence passing threshold or be set by the user a passing threshold;
The described release stage comprises the steps:
Step 4, user import release voice password, and the preservation form of described voice password in mobile phone is time-domain signal;
Step 5, be that the voice data of the described release voice password of time-domain signal is carried out fast Fourier transform with the preservation form, the voice data of described release voice password is transformed into the frequency-region signal spectrum of release voice password;
Step 6, the described release voice password frequency-region signal of calculating and the described difference value that presets voice password frequency-region signal;
Step 7, whether judge described difference value less than described passing threshold, if, then remove mobile phone screen locking state, if not, then point out the release failure.
Further, described difference value obtains by asking Euclidean distance.
Scheme two: a kind of method based on voice short command and the screen locking of vocal print technology releasing mobile phone, comprise preset stage and release stage, described preset stage comprises the steps:
The voice password is preset in step 10, user input, and described to preset the preservation form of voice password in mobile phone be time-domain signal;
Step 11, be that the described voice data that presets the voice password of time-domain signal carries out the windowing process of branch frame with the preservation form, and calculate the number of frames N that presets the voice password;
Step 12, each frame is preset the voice password carry out fast Fourier transform, each frame presets the frequency-region signal that voice password correspondent transform becomes to preset the voice password;
The triangular window filter of linear distribution on step 13, X Mel frequency marking of usefulness, to the frequency-region signal filtering successively of respectively presetting the voice password, after the filtering, each frequency-region signal that presets the voice password all obtains X corresponding energy value; Described X is natural number, 1≤X≤128;
Step 14, to preceding Y 1Frame presets X the energy value that each frame in the voice password presets voice password correspondence and asks the noise energy average that presets the voice password, described Y 1Be natural number, 1≤Y 1≤ N; The described process that presets voice password noise energy average of asking is specially: X energy value to the Y that first frame is preset voice password correspondence 1Frame presets X energy value of voice password correspondence and asks arithmetic mean respectively, obtains presetting X noise energy average of voice password, and the solution procedure of described arithmetic mean is specially: first energy value to the Y that namely earlier first frame is preset voice password correspondence 1Frame presets first energy value of voice password correspondence and asks arithmetic mean, obtains presetting first noise energy average of voice password, then and the like ask arithmetic mean, obtain presetting X noise energy average of voice password after finishing altogether;
Step 15, at remaining N-Y 1Frame presets in the voice password, and X the energy value that each frame presets voice password correspondence deducts X the noise energy average that presets the voice password respectively accordingly, and each frame presets the voice password and all obtains X corresponding with it noise reduction energy value; Described N-Y 1Refer to the voice password that presets of N frame is removed be used to first frame to the Y that asks the noise energy average 1Frame presets the voice password;
Step 16, to the residue N-Y 1Frame presets X the noise reduction energy value that each frame in the voice password presets voice password correspondence and carries out discrete cosine transform, obtains to preset the N-Y of voice password altogether 1Individual Z dimension MFCC coefficient; Described Z is natural number, 1≤Z≤128;
Step 17, the N-Y that presets the voice password to obtaining 1Individual Z dimension MFCC coefficient carries out vector quantization, and it is K that the length that quantizes code book is set, and K is natural number, and 1≤K≤128; Then obtain one and quantize code book, this quantizes code book and is made up of K Z dimension MFCC;
Step 18, the user mobile phone system provide one the acquiescence passing threshold or be set by the user a passing threshold;
The described release stage comprises the steps:
Step 20, user import release voice password; The preservation form of described release voice password in mobile phone is time-domain signal;
Step 21, be that the voice data of the described release voice password of time-domain signal carries out the windowing process of branch frame with the preservation form, and calculate the number of frames M of release voice password;
Step 22, fast Fourier transform carried out in each frame release voice password, each frame release voice password correspondent transform becomes the frequency-region signal of a release voice password;
Step 23, with the frequency-region signal successively filtering of described triangular window filter to each release voice password, after the filtering, the frequency-region signal of each release voice password all obtains X energy value of corresponding release voice password;
Step 24, to preceding Y 2X energy value of each frame release voice password correspondence found the solution the noise energy average of lock voice password in the frame release voice password, and the described process of finding the solution lock voice password noise energy average is specially: with energy value to the Y of X release voice password of the first frame release voice password correspondence 2X energy value of frame release voice password correspondence asked arithmetic mean respectively, obtains X noise energy average of release voice password, and the solution procedure of described arithmetic mean is specially: namely elder generation is to first energy value to the Y of the first frame release voice password correspondence 2First energy value of frame release voice password correspondence is asked arithmetic mean, obtains first noise energy average of release voice password, then and the like ask arithmetic mean, obtain the noise energy average of X release voice password after finishing altogether;
Step 25, at remaining M-Y 2In the frame release voice password, X average energy value of each frame release voice password correspondence deducts X noise energy average of release voice password respectively accordingly, and each frame release voice password all obtains X corresponding with it noise reduction energy value; Described M-Y 2Refer to the release voice password of M frame is removed be used to first frame to the Y that asks the noise energy average 2Frame release voice password;
Step 26, to the residue M-Y 2X noise reduction energy value of each frame release voice password correspondence carries out discrete cosine transform in the frame release voice password, obtains the M-Y of release voice password altogether 2Individual Z dimension MFCC coefficient;
Step 27, each Z dimension MFCC coefficient of release voice password is compared one by one with the described quantification code book that presets the voice password respectively, release voice password is M-Y altogether 2Individual Z dimension MFCC coefficient is then compared M-Y 2Wheel is made up of K Z dimension MFCC because this quantizes code book, and each takes turns comparison, all obtains K distance value, and chooses lowest distance value wherein, and namely each is taken turns comparison and obtains a lowest distance value, has all compared, and obtains M-Y altogether 2Individual lowest distance value is with M-Y 2Individual lowest distance value summation and divided by M-Y 2, obtain average minimum range; Described comparison is for asking Euclidean distance;
Step 28, whether judge described average minimum range less than described passing threshold, if, then remove mobile phone screen locking state, if not, then point out the release failure.
Further, in the step 11, by formula N=(L 1-20)/10+1 rounds downwards and tries to achieve the described number of frames N that presets the voice password, wherein, L in the formula 1Represent the described audio frequency duration that presets the voice password, L 1Unit be millisecond, 20 in formula expression frame length is 20 milliseconds, the expression of 10 in formula frame is superposed to 10 milliseconds.
Further, in the step 21, by formula M=(L 2-20)/10+1 rounds the number of frames M that tries to achieve described release voice password downwards, wherein, L in the formula 2The audio frequency duration of representing described release voice password, L 2Unit be millisecond, 20 in formula expression frame length is 20 milliseconds, the expression of 10 in formula frame is superposed to 10 milliseconds.
Further, import describedly when presetting voice password and described release voice password, the described signal sampling rate that presets voice password and described release voice password is 16000Hz.
Further, number 24≤X≤39 of the Mel frequency marking of described triangular window filter.
Further, described triangular window filter is the triangular window filter with linear distribution on 24 Mel frequency markings, i.e. X=24, and the centre frequency of described triangular window filter is respectively: 100,200,300,400,500,600,700,800,900,1000,1149,1320,1516,1741,2000,2297,2639,3031,3482,4000,4595,5278,6063,6964, bandwidth is: 100,100,100,100,100,100,100,100,100,124,160,184,211,242,278,320,367,422,484,556,639,734,843,969, above numerical value unit is Hz.
Further, described triangular window filter is the triangular window filter with linear distribution on 39 Mel frequency markings, i.e. X=39, and the centre frequency of described triangular window filter is respectively: 50,100,150,200,260,320,390,460,530,610,700,790,890,990,1100,1210,1340,1480,1610,1770,1930,2100,2280,2480,2680,2900,3140,3380,3650,3930,4230,4560,4900,5260,5650,6060,6500,6970,7470, bandwidth is: 100,100,100,120,127,127,148,148,148,169,190,190,233,233,254,254,296,296,275,339,339,360,381,424,424,466,508,508,572,593,636,699,720,763,826,869,932,996,1060, above numerical value unit is Hz.
Further, the computing formula of described discrete cosine transform is:
Σ j = 1 X En ( j ) cos [ π ( i + 1 ) ( j - 0.5 ) 24 ] , Wherein En (j) represents j noise reduction energy value 1≤j≤X, 1≤i≤Z, and i, j are natural number.
Further, the windowing process in step 11 and the step 21 is and adds the Hamming window processing.
The present invention has following advantage: remove mobile phone screen locking state by voice short command and voiceprint authentication technology, not only convenient and swift but also guaranteed the fail safe that mobile phone uses, simultaneously, introduce and divide the frame windowing, MFCC coefficient calculations and vector quantization treatment technology, user's sound property can be extracted and compare more accurately, and the user who has improved in convenience and the fail safe experiences.
[embodiment]
The present invention program one specific embodiment is as follows:
A kind of method based on voice short command and the screen locking of vocal print technology releasing mobile phone comprises preset stage and release stage, and described preset stage comprises the steps:
The voice password is preset in step 1, user's input, and the preservation form of described voice password in mobile phone is time-domain signal;
Step 2, be that the voice data that time-domain signal described presets the voice password is carried out fast fourier transform with the preservation form, the described voice data that presets the voice password is transformed into the frequency-region signal that presets the voice password;
Step 3, in the user mobile phone system, provide one the acquiescence passing threshold or be set by the user a passing threshold;
The described release stage comprises the steps:
Step 4, user import release voice password, and the preservation form of described voice password in mobile phone is time-domain signal;
Step 5, be that the voice data of the described release voice password of time-domain signal is carried out fast Fourier transform with the preservation form, the voice data of described release voice password is transformed into the frequency-region signal spectrum of release voice password;
Step 6, the described release voice password frequency-region signal of calculating and the described difference value that presets voice password frequency-region signal; Described difference value obtains by asking Euclidean distance;
Step 7, whether judge described difference value less than described passing threshold, if, then remove mobile phone screen locking state, if not, then point out the release failure.
The present invention program two first embodiment is as follows:
A kind of method based on voice short command and the screen locking of vocal print technology releasing mobile phone, it is characterized in that: comprise preset stage and release stage, described preset stage comprises the steps:
The voice password is preset in step 10, user input, and described to preset the preservation form of voice password in mobile phone be time-domain signal; The described signal sampling rate that presets the voice password is 16000Hz;
Step 11, be that the described voice data that presets the voice password of time-domain signal carries out the windowing process of branch frame with the preservation form, and calculate the number of frames N that presets the voice password; In the present embodiment by formula N=(L 1-20)/10+1 rounds downwards and tries to achieve the described number of frames N that presets the voice password, wherein, L in the formula 1Represent the described audio frequency duration that presets the voice password, the expression of 20 in formula frame length is 20 milliseconds, and the expression of 10 in formula frame is superposed to 10 milliseconds; Described windowing process is handled for adding Hamming window;
Step 12, each frame is preset the voice password carry out fast Fourier transform, each frame presets the frequency-region signal that voice password correspondent transform becomes to preset the voice password;
The triangular window filter of linear distribution on step 13, X Mel frequency marking of usefulness, to the frequency-region signal filtering successively of respectively presetting the voice password, after the filtering, each frequency-region signal that presets the voice password all obtains X corresponding energy value; Described X is natural number, 1≤X≤128.More preferably, number 24≤X≤39 of the Mel frequency marking of described triangular window filter, select for use the triangular window filter of this X scope to obtain reasonably to compromise at operation efficiency and between to the descriptive power of characteristics of speech sounds, obviously, filter quantity is more big, the value that is X is more big, and is just more meticulous to the description of characteristics of speech sounds, but operation efficiency can reduce.
Step 14, to preceding Y 1Frame presets X the energy value that each frame in the voice password presets voice password correspondence and asks the noise energy average that presets the voice password, described Y 1Be natural number, 1≤Y 1≤ N; The described process that presets voice password noise energy average of asking is specially: X energy value to the Y that first frame is preset voice password correspondence 1Frame presets X energy value of voice password correspondence and asks arithmetic mean respectively, obtains presetting X noise energy average of voice password, and the solution procedure of described arithmetic mean is specially: first energy value to the Y that namely earlier first frame is preset voice password correspondence 1Frame presets first energy value of voice password correspondence and asks arithmetic mean, obtains presetting first noise energy average of voice password, then and the like ask arithmetic mean, obtain presetting X noise energy average of voice password after finishing altogether;
Step 15, at remaining N-Y 1Frame presets in the voice password, and X the energy value that each frame presets voice password correspondence deducts X the noise energy average that presets the voice password respectively accordingly, and each frame presets the voice password and all obtains X corresponding with it noise reduction energy value; Described N-Y 1Refer to the voice password that presets of N frame is removed be used to first frame to the Y that asks the noise energy average 1Frame presets the voice password;
Step 16, to the residue N-Y 1Frame presets X the noise reduction energy value that each frame in the voice password presets voice password correspondence and carries out discrete cosine transform, obtains to preset the N-Y of voice password altogether 1Individual Z dimension MFCC coefficient; Described Z is natural number, 1≤Z≤128; The computing formula of described discrete cosine transform is:
Figure GDA00003408108500081
Wherein, when carrying out discrete cosine transform to presetting the voice password, j noise reduction energy value that presets the voice password of En (j) expression, 1≤j≤X, 1≤i≤Z, i, j are natural number;
Step 17, the N-Y that presets the voice password to obtaining 1Individual Z dimension MFCC coefficient carries out vector quantization, and it is K that the length that quantizes code book is set, and K is the natural number more than or equal to 1, and 1≤K≤128; Then obtain one and quantize code book, this quantizes code book and is made up of K Z dimension MFCC;
Step 18, the user mobile phone system provide one the acquiescence passing threshold or be set by the user a passing threshold;
The described release stage comprises the steps:
Step 20, user import release voice password, and the preservation form of described release voice password in mobile phone is time-domain signal; Signal sampling rate to described release voice password is 16000Hz;
Step 21, be that the voice data of the described release voice password of time-domain signal carries out the windowing process of branch frame with the preservation form, and calculate the number of frames M of release voice password, in the present embodiment by formula M=(L 2-20)/10+1 rounds the number of frames M that tries to achieve described release voice password downwards, wherein, L in the formula 2The audio frequency duration of representing described release voice password, the expression of 20 in formula frame length is 20 milliseconds, the expression of 10 in formula frame is superposed to 10 milliseconds; Described windowing process is handled for adding Hamming window;
Step 22, fast Fourier transform carried out in each frame release voice password, each frame release voice password correspondent transform becomes the frequency-region signal of a release voice password;
Step 23, with the frequency-region signal successively filtering of described triangular window filter to each release voice password, after the filtering, the frequency-region signal of each release voice password all obtains X energy value of corresponding release voice password;
Step 24, to preceding Y 2X energy value of each frame release voice password correspondence found the solution the noise energy average of lock voice password in the frame release voice password, and the described process of finding the solution lock voice password noise energy average is specially: with energy value to the Y of X release voice password of the first frame release voice password correspondence 2X energy value of frame release voice password correspondence asked arithmetic mean respectively, obtains X noise energy average of release voice password, and the solution procedure of described arithmetic mean is specially: namely elder generation is to first energy value to the Y of the first frame release voice password correspondence 2First energy value of frame release voice password correspondence is asked arithmetic mean, obtains first noise energy average of release voice password, then and the like ask arithmetic mean, obtain the noise energy average of X release voice password after finishing altogether;
Step 25, at remaining M-Y 2In the frame release voice password, X average energy value of each frame release voice password correspondence deducts X noise energy average of release voice password respectively accordingly, and each frame release voice password all obtains X corresponding with it noise reduction energy value; Described M-Y 2Refer to the release voice password of M frame is removed be used to first frame to the Y that asks the noise energy average 2Frame release voice password;
Step 26, to the residue M-Y 2X noise reduction energy value of each frame release voice password correspondence carries out discrete cosine transform in the frame release voice password, obtains the M-Y of release voice password altogether 2Individual Z dimension MFCC coefficient; The computing formula of described discrete cosine transform is with to preset the computing formula that discrete cosine that the voice password adopts changes identical, namely
Figure GDA00003408108500091
When release voice password is carried out discrete cosine transform, the noise reduction energy value of j release voice password of En (j) expression, 1≤j≤X, 1≤i≤Z, i, j are natural number;
Step 27, each Z dimension MFCC coefficient of release voice password is compared one by one with the described quantification code book that presets the voice password respectively, release voice password is M-Y altogether 2Individual Z dimension MFCC coefficient is then compared M-Y 2Wheel is made up of K Z dimension MFCC because this quantizes code book, and each takes turns comparison, all obtains K distance value, and chooses lowest distance value wherein, and namely each is taken turns comparison and obtains a lowest distance value, has all compared, and obtains M-Y altogether 2Individual lowest distance value is with M-Y 2Individual lowest distance value summation and divided by M-Y 2, obtain average minimum range; Described comparison is for asking Euclidean distance;
Step 28, whether judge described average minimum range less than described passing threshold, if, then remove mobile phone screen locking state, if not, then point out the release failure.
When the signal sampling rate that presets voice password and release voice password of the present invention all adopts 16000Hz; Can reduce the amount of audio data that needs processing under the prerequisite that does not influence speech quality like this, also be simultaneously the sample frequency that most of audio input device are supported.
The present invention program two second embodiment is as follows:
In the present embodiment, get X=24, Y 1=3, Y 2=3, Z=13, K=5
A kind of method based on voice short command and the screen locking of vocal print technology releasing mobile phone, it is characterized in that: comprise preset stage and release stage, described preset stage comprises the steps:
The voice password is preset in step 10, user input, and described to preset the preservation form of voice password in mobile phone be time-domain signal; The described signal sampling rate that presets the voice password is 16000Hz;
Step 11, be that the described voice data that presets the voice password of time-domain signal carries out the windowing process of branch frame with the preservation form, and calculate the number of frames N that presets the voice password; In the present embodiment by formula N=(L 1-20)/10+1 rounds downwards and tries to achieve the described number of frames N that presets the voice password, wherein, L in the formula 1Represent the described audio frequency duration that presets the voice password, the expression of 20 in formula frame length is 20 milliseconds, and the expression of 10 in formula frame is superposed to 10 milliseconds; Described windowing process is handled for adding Hamming window;
Step 12, each frame is preset the voice password carry out fast Fourier transform, each frame presets the frequency-region signal that voice password correspondent transform becomes to preset the voice password;
Step 13, with the triangular window filter of linear distribution on 24 Mel frequency markings, to the frequency-region signal filtering successively of respectively presetting the voice password, after the filtering, each frequency-region signal that presets the voice password all obtains 24 corresponding energy values; The triangular window filter of linear distribution on described 24 Mel frequency markings, its centre frequency is respectively: 100,200,300,400,500,600,700,800,900,1000,1149,1320,1516,1741,2000,2297,2639,3031,3482,4000,4595,5278,6063,6964, bandwidth is: 100,100,100,100,100,100,100,100,100,124,160,184,211,242,278,320,367,422,484,556,639,734,843,969, above numerical value unit is Hz;
Step 14, preceding 3 frames are preset 24 energy values that each frame in the voice password presets voice password correspondence ask the noise energy average that presets the voice password, the described process that presets voice password noise energy average of asking is specially: 24 energy value to the 3 frames that first frame is preset voice password correspondence preset 24 energy values of voice password correspondence and ask arithmetic mean respectively, obtain presetting 24 noise energy averages of voice password, the solution procedure of described arithmetic mean is specially: namely first energy value to the 3 frames that earlier first frame preset voice password correspondence first energy value of presetting voice password correspondence is asked arithmetic mean, obtain presetting first noise energy average of voice password, then and the like ask arithmetic mean, obtain presetting 24 noise energy averages of voice password after finishing altogether;
Step 15, preset in the voice password at remaining N-3 frame, 24 energy values that each frame presets voice password correspondence deduct 24 noise energy averages that preset the voice password respectively accordingly, and each frame presets the voice password and all obtains 24 corresponding with it noise reduction energy values; Described N-3 refers to the voice password that presets of N frame is removed be used to first frame to the, 3 frames of asking the noise energy average and presets the voice password;
Step 16, residue N-3 frame is preset 24 noise reduction energy values that each frame in the voice password presets voice password correspondence carry out discrete cosine transform, obtain to preset N-3 13 dimension MFCC coefficients of voice password altogether; The computing formula of described discrete cosine transform is:
Figure GDA00003408108500111
Wherein, when carrying out discrete cosine transform to presetting the voice password, j noise reduction energy value that presets the voice password of En (j) expression, 1≤j≤24,1≤i≤13, i, j are natural number; Specify as follows:
Preset to appoint the voice password from residue N-3 frame and get 24 noise reduction energy values that a frame presets voice password correspondence, get i=1 earlier, try to achieve this frame and preset the first dimension MFCC coefficient of voice password, try to achieve this frame when getting i=13 and preset the 13rd dimension MFCC coefficient of voice password, analogize in proper order, the value of i is taken at 13 o'clock from 1, obtains the 13 dimension MFCC coefficients that this frame presets voice password correspondence altogether; Each frame that residue M-3 frame is preset the voice password presets the voice password all by after the described diffusing cosine transform computing formula calculating, obtains to preset N-3 13 dimension MFCC coefficients of voice password;
Step 17, the N-3 that presets the voice password 13 dimension MFCC coefficients that obtain are carried out vector quantization, it is 5 that the length that quantizes code book is set, and then obtains one and quantizes code book, and this quantizes code book and is made up of 5 13 dimension MFCC; Excessive quantification code book length can cause the increase of computing time, quantize the code book curtailment and then be not enough to portray the phonetic feature that presets password, selection quantification code book length K=5 o'clock is not only lacked computing time, also can effectively portray the phonetic feature that presets the voice password simultaneously;
Step 18, the user mobile phone system provide one the acquiescence passing threshold or be set by the user a passing threshold;
The described release stage comprises the steps:
Step 20, user import release voice password, and the preservation form of described release voice password in mobile phone is time-domain signal; Signal sampling rate to described release voice password is 16000Hz;
Step 21, be that the voice data of the described release voice password of time-domain signal carries out the windowing process of branch frame with the preservation form, and calculate the number of frames M of release voice password, in the present embodiment by formula M=(L 2-20)/10+1 rounds the number of frames M that tries to achieve described release voice password downwards, wherein, L in the formula 2The audio frequency duration of representing described release voice password, the expression of 20 in formula frame length is 20 milliseconds, the expression of 10 in formula frame is superposed to 10 milliseconds; Described windowing process is handled for adding Hamming window;
Step 22, fast Fourier transform carried out in each frame release voice password, each frame release voice password correspondent transform becomes the frequency-region signal of a release voice password;
Step 23, with the frequency-region signal successively filtering of described triangular window filter to each release voice password, after the filtering, the frequency-region signal of each release voice password all obtains 24 energy values of corresponding release voice password;
Step 24,24 energy values of each frame release voice password correspondence in the preceding 3 frame release voice passwords are found the solution the noise energy average of lock voice password, the described process of finding the solution lock voice password noise energy average is specially: 24 energy values of energy value to the 3 frame release voice password correspondences of 24 release voice passwords of the first frame release voice password correspondence are asked arithmetic mean respectively, obtain 24 noise energy averages of release voice password, the solution procedure of described arithmetic mean is specially: namely elder generation asks arithmetic mean to first energy value of first energy value to the 3 frame release voice password correspondences of the first frame release voice password correspondence, obtain first noise energy average of release voice password, then and the like ask arithmetic mean, obtain the noise energy average of 24 release voice passwords after finishing altogether;
Step 25, in remaining M-3 frame release voice password, 24 average energy value of each frame release voice password correspondence deduct 24 noise energy averages of release voice password respectively accordingly, and each frame release voice password all obtains 24 corresponding with it noise reduction energy values; Described M-3 refers to the release voice password of M frame is removed be used to first frame to the, the 3 frame release voice passwords of asking the noise energy average;
Step 26,24 noise reduction energy values of each frame release voice password correspondence in the residue M-3 frame release voice password are carried out discrete cosine transform, obtain M-3 13 dimension MFCC coefficients of release voice password altogether; The computing formula of described discrete cosine transform is with to preset the computing formula that discrete cosine that the voice password adopts changes identical, namely When release voice password is carried out discrete cosine transform, the noise reduction energy value of j release voice password of En (j) expression, 1≤j≤24,1≤i≤13, i, j are natural number;
Step 27, each 13 dimension MFCC coefficient of release voice password are compared one by one with the described quantification code book that presets the voice password respectively, release voice password is M-3 13 dimension MFCC coefficients altogether, then compare the M-3 wheel, because quantizing code book, this is formed by 5 13 dimension MFCC, each takes turns comparison, all obtain 5 distance values, and choose wherein lowest distance value, be that each takes turns lowest distance value of comparison acquisition, all compared, obtain M-3 lowest distance value altogether, with M-3 lowest distance value summation and divided by M-3, obtain average minimum range; Described comparison is for asking Euclidean distance;
Now illustrate comparison process: suppose K=5, M-3=6, then first round comparison is: select one earlier from 6 13 dimension MFCC coefficients of release voice password, and with its with preset 5 the 13 dimension MFCC coefficients that the voice password quantizes in the code book and ask Euclidean distance respectively, then produce 5 distance values, choose distance value minimum in these 5 distance values as the lowest distance value of first round comparison; Second taking turns comparison and be then: from 13 dimension MFCC coefficients of remaining 5 release voice passwords carrying out comparison, select one again, and with its with preset 5 the 13 dimension MFCC coefficients that the voice password quantizes in the code book and ask Euclidean distance respectively, produce 5 distance values, choose distance value minimum in these 5 distance values as second lowest distance value of taking turns comparison; By that analogy, 13 dimension MFCC coefficients of 6 release voice passwords are arranged, then carry out 6 and take turns comparison; Each is taken turns comparison and all obtains 5 distance values, and chooses lowest distance value wherein, has all compared, and obtains 6 lowest distance value altogether;
Step 28, whether judge described average minimum range less than described passing threshold, if, then remove mobile phone screen locking state, if not, then point out the release failure.
In the present invention, described triangular window filter is the triangular window filter that also can select linear distribution on 39 Mel frequency markings for use, i.e. X=39, and the centre frequency of described triangular window filter is respectively: 50,100,150,200,260,320,390,460,530,610,700,790,890,990,1100,1210,1340,1480,1610,1770,1930,2100,2280,2480,2680,2900,3140,3380,3650,3930,4230,4560,4900,5260,5650,6060,6500,6970,7470, bandwidth is: 100,100,100,120,127,127,148,148,148,169,190,190,233,233,254,254,296,296,275,339,339,360,381,424,424,466,508,508,572,593,636,699,720,763,826,869,932,996,1060, above numerical value unit is Hz.During the triangular window filter of linear distribution, its principle is all identical with embodiment two with the present invention program two embodiment one on selecting 39 Mel frequency markings for use.
Remove mobile phone screen locking state by voice short command and voiceprint authentication technology, not only convenient and swift but also guaranteed the fail safe that mobile phone uses, simultaneously, introduce and divide the frame windowing, MFCC coefficient calculations and vector quantization treatment technology, user's sound property can be extracted and compare more accurately, and the user who has improved in convenience and the fail safe experiences.
Though more than described the specific embodiment of the present invention; but being familiar with those skilled in the art is to be understood that; our described specific embodiment is illustrative; rather than for the restriction to scope of the present invention; those of ordinary skill in the art are in modification and the variation of the equivalence of doing according to spirit of the present invention, all should be encompassed in the scope that claim of the present invention protects.

Claims (9)

1. remove the method for mobile phone screen locking based on voice short command and vocal print technology for one kind, it is characterized in that: comprise preset stage and release stage, described preset stage comprises the steps:
The voice password is preset in step 10, user input, and described to preset the preservation form of voice password in mobile phone be time-domain signal;
Step 11, be that the described voice data that presets the voice password of time-domain signal carries out the windowing process of branch frame with the preservation form, and calculate the number of frames N that presets the voice password;
Step 12, each frame is preset the voice password carry out fast Fourier transform, each frame presets the frequency-region signal that voice password correspondent transform becomes to preset the voice password;
The triangular window filter of linear distribution on step 13, X Mel frequency marking of usefulness, to the frequency-region signal filtering successively of respectively presetting the voice password, after the filtering, each frequency-region signal that presets the voice password all obtains X corresponding energy value; Described X is natural number, 1≤X≤128;
Step 14, to preceding Y 1Frame presets X the energy value that each frame in the voice password presets voice password correspondence and asks the noise energy average that presets the voice password, described Y 1Be natural number, 1≤Y 1≤ N; The described process that presets voice password noise energy average of asking is specially: X energy value to the Y that first frame is preset voice password correspondence 1Frame presets X energy value of voice password correspondence and asks arithmetic mean respectively, obtains presetting X noise energy average of voice password, and the solution procedure of described arithmetic mean is specially: first energy value to the Y that namely earlier first frame is preset voice password correspondence 1Frame presets first energy value of voice password correspondence and asks arithmetic mean, obtains presetting first noise energy average of voice password, then and the like ask arithmetic mean, obtain presetting X noise energy average of voice password after finishing altogether;
Step 15, at remaining N-Y 1Frame presets in the voice password, and X the energy value that each frame presets voice password correspondence deducts X the noise energy average that presets the voice password respectively accordingly, and each frame presets the voice password and all obtains X corresponding with it noise reduction energy value; Described N-Y 1Refer to the voice password that presets of N frame is removed be used to first frame to the Y that asks the noise energy average 1Frame presets the voice password;
Step 16, to the residue N-Y 1Frame presets X the noise reduction energy value that each frame in the voice password presets voice password correspondence and carries out discrete cosine transform, obtains to preset the N-Y of voice password altogether 1Individual Z dimension MFCC coefficient; Described Z is natural number, 1≤Z≤128;
Step 17, the N-Y that presets the voice password to obtaining 1Individual Z dimension MFCC coefficient carries out vector quantization, and it is K that the length that quantizes code book is set, and K is natural number, and 1≤K≤128; Then obtain one and quantize code book, this quantizes code book and is made up of K Z dimension MFCC;
Step 18, the user mobile phone system provide one the acquiescence passing threshold or be set by the user a passing threshold;
The described release stage comprises the steps:
Step 20, user import release voice password; The preservation form of described release voice password in mobile phone is time-domain signal;
Step 21, be that the voice data of the described release voice password of time-domain signal carries out the windowing process of branch frame with the preservation form, and calculate the number of frames M of release voice password;
Step 22, fast Fourier transform carried out in each frame release voice password, each frame release voice password correspondent transform becomes the frequency-region signal of a release voice password;
Step 23, with the frequency-region signal successively filtering of described triangular window filter to each release voice password, after the filtering, the frequency-region signal of each release voice password all obtains X energy value of corresponding release voice password;
Step 24, to preceding Y 2X energy value of each frame release voice password correspondence found the solution the noise energy average of lock voice password in the frame release voice password, and the described process of finding the solution lock voice password noise energy average is specially: with energy value to the Y of X release voice password of the first frame release voice password correspondence 2X energy value of frame release voice password correspondence asked arithmetic mean respectively, obtains X noise energy average of release voice password, and the solution procedure of described arithmetic mean is specially: namely elder generation is to first energy value to the Y of the first frame release voice password correspondence 2First energy value of frame release voice password correspondence is asked arithmetic mean, obtains first noise energy average of release voice password, then and the like ask arithmetic mean, obtain the noise energy average of X release voice password after finishing altogether;
Step 25, at remaining M-Y 2In the frame release voice password, X average energy value of each frame release voice password correspondence deducts X noise energy average of release voice password respectively accordingly, and each frame release voice password all obtains X corresponding with it noise reduction energy value; Described M-Y 2Refer to the release voice password of M frame is removed be used to first frame to the Y that asks the noise energy average 2Frame release voice password;
Step 26, to the residue M-Y 2X noise reduction energy value of each frame release voice password correspondence carries out discrete cosine transform in the frame release voice password, obtains the M-Y of release voice password altogether 2Individual Z dimension MFCC coefficient;
Step 27, each Z dimension MFCC coefficient of release voice password is compared one by one with the described quantification code book that presets the voice password respectively, release voice password is M-Y altogether 2Individual Z dimension MFCC coefficient is then compared M-Y 2Wheel is made up of K Z dimension MFCC because this quantizes code book, and each takes turns comparison, all obtains K distance value, and chooses lowest distance value wherein, and namely each is taken turns comparison and obtains a lowest distance value, has all compared, and obtains M-Y altogether 2Individual lowest distance value is with M-Y 2Individual lowest distance value summation and divided by M-Y 2, obtain average minimum range; Described comparison is for asking Euclidean distance;
Step 28, whether judge described average minimum range less than described passing threshold, if, then remove mobile phone screen locking state, if not, then point out the release failure.
2. according to the method based on voice short command and the screen locking of vocal print technology releasing mobile phone described in the claim 1, it is characterized in that: in the step 11, by formula N=(L 1-20)/10+1 rounds downwards and tries to achieve the described number of frames N that presets the voice password, wherein, L in the formula 1Represent the described audio frequency duration that presets the voice password, L 1Unit be millisecond, 20 in formula expression frame length is 20 milliseconds, the expression of 10 in formula frame is superposed to 10 milliseconds.
3. according to the method based on voice short command and the screen locking of vocal print technology releasing mobile phone described in the claim 1, it is characterized in that: in the step 21, by formula M=(L 2-20)/10+1 rounds the number of frames M that tries to achieve described release voice password downwards, wherein, L in the formula 2The audio frequency duration of representing described release voice password, L 2Unit be millisecond, 20 in formula expression frame length is 20 milliseconds, the expression of 10 in formula frame is superposed to 10 milliseconds.
4. according to the method based on voice short command and the screen locking of vocal print technology releasing mobile phone described in the claim 1, it is characterized in that: import describedly when presetting voice password and described release voice password, the described signal sampling rate that presets voice password and described release voice password is 16000Hz.
5. according to the method based on voice short command and the screen locking of vocal print technology releasing mobile phone described in the claim 1, it is characterized in that: number 24≤X≤39 of the Mel frequency marking of described triangular window filter.
6. according to removing the method for mobile phone screen locking based on voice short command and vocal print technology described in the claim 5, it is characterized in that: described triangular window filter is the triangular window filter with linear distribution on 24 Mel frequency markings, i.e. X=24, and the centre frequency of described triangular window filter is respectively: 100,200,300,400,500,600,700,800,900,1000,1149,1320,1516,1741,2000,2297,2639,3031,3482,4000,4595,5278,6063,6964, bandwidth is: 100,100,100,100,100,100,100,100,100,124,160,184,211,242,278,320,367,422,484,556,639,734,843,969, above numerical value unit is Hz.
7. according to removing the method for mobile phone screen locking based on voice short command and vocal print technology described in the claim 5, it is characterized in that: described triangular window filter is the triangular window filter with linear distribution on 39 Mel frequency markings, i.e. X=39, and the centre frequency of described triangular window filter is respectively: 50,100,150,200,260,320,390,460,530,610,700,790,890,990,1100,1210,1340,1480,1610,1770,1930,2100,2280,2480,2680,2900,3140,3380,3650,3930,4230,4560,4900,5260,5650,6060,6500,6970,7470, bandwidth is: 100,100,100,120,127,127,148,148,148,169,190,190,233,233,254,254,296,296,275,339,339,360,381,424,424,466,508,508,572,593,636,699,720,763,826,869,932,996,1060, above numerical value unit is Hz.
8. according to the method based on voice short command and the screen locking of vocal print technology releasing mobile phone described in the claim 1, it is characterized in that: the computing formula of described discrete cosine transform is:
Σ j = 1 X En ( j ) cos [ π ( i + 1 ) ( j - 0.5 ) 24 ] , Wherein En (j) represents j noise reduction energy value, 1≤j≤X, and 1≤i≤Z, i, j are natural number.
9. according to the method based on voice short command and the screen locking of vocal print technology releasing mobile phone described in the claim 1, it is characterized in that: the windowing process in step 11 and the step 21 is and adds the Hamming window processing.
CN2012100970831A 2012-04-05 2012-04-05 Method for removing lock of mobile phone screen based on short voice command and voice-print technology Active CN102647521B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012100970831A CN102647521B (en) 2012-04-05 2012-04-05 Method for removing lock of mobile phone screen based on short voice command and voice-print technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012100970831A CN102647521B (en) 2012-04-05 2012-04-05 Method for removing lock of mobile phone screen based on short voice command and voice-print technology

Publications (2)

Publication Number Publication Date
CN102647521A CN102647521A (en) 2012-08-22
CN102647521B true CN102647521B (en) 2013-10-09

Family

ID=46660091

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012100970831A Active CN102647521B (en) 2012-04-05 2012-04-05 Method for removing lock of mobile phone screen based on short voice command and voice-print technology

Country Status (1)

Country Link
CN (1) CN102647521B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8954736B2 (en) * 2012-10-04 2015-02-10 Google Inc. Limiting the functionality of a software program based on a security model
CN104937603B (en) 2013-01-10 2018-09-25 日本电气株式会社 terminal, unlocking method and program
CN103943110A (en) * 2013-01-21 2014-07-23 联想(北京)有限公司 Control method, device and electronic equipment
CN103280219A (en) * 2013-05-16 2013-09-04 中山大学 Android platform-based voiceprint recognition method
TWI481774B (en) * 2013-09-18 2015-04-21 Generalplus Technology Inc Method for unlocking door, method for leasing asset and system thereof
CN103760969A (en) * 2013-12-12 2014-04-30 宇龙计算机通信科技(深圳)有限公司 Mobile terminal and method for controlling application program through voice
CN104219381B (en) * 2014-08-18 2017-08-25 上海卓易科技股份有限公司 A kind of intelligent unlocking method, terminal and system
CN105469791A (en) * 2014-09-04 2016-04-06 中兴通讯股份有限公司 Method and device for processing service
CN104965724A (en) * 2014-12-16 2015-10-07 深圳市腾讯计算机系统有限公司 Working state switching method and apparatus
CN106601238A (en) * 2015-10-14 2017-04-26 阿里巴巴集团控股有限公司 Application operation processing method and application operation processing device
CN105869244B (en) * 2016-03-31 2018-11-02 青岛歌尔声学科技有限公司 A kind of sound password unlocking method and coded lock
US11322157B2 (en) * 2016-06-06 2022-05-03 Cirrus Logic, Inc. Voice user interface
CN106250742A (en) * 2016-07-22 2016-12-21 北京小米移动软件有限公司 The unlocking method of mobile terminal, device and mobile terminal
CN107147791B (en) * 2017-05-15 2019-11-15 上海与德科技有限公司 A kind of method, device and mobile terminal of speech unlocking
CN107644645A (en) * 2017-09-29 2018-01-30 联想(北京)有限公司 A kind of sound control method, device and electronic equipment
CN111622616B (en) * 2020-04-15 2021-11-02 阜阳万瑞斯电子锁业有限公司 Personal voice recognition unlocking system and method for electronic lock

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1746971A (en) * 2004-09-09 2006-03-15 上海优浪信息科技有限公司 Speech key of mobile
CN101064043A (en) * 2006-04-29 2007-10-31 上海优浪信息科技有限公司 Sound-groove gate inhibition system and uses thereof
CN102148899A (en) * 2011-03-29 2011-08-10 广东欧珀移动通信有限公司 Mobile phone acoustic-control unlocking method
CN102324232A (en) * 2011-09-12 2012-01-18 辽宁工业大学 Method for recognizing sound-groove and system based on gauss hybrid models

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100762596B1 (en) * 2006-04-05 2007-10-01 삼성전자주식회사 Speech signal pre-processing system and speech signal feature information extracting method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1746971A (en) * 2004-09-09 2006-03-15 上海优浪信息科技有限公司 Speech key of mobile
CN101064043A (en) * 2006-04-29 2007-10-31 上海优浪信息科技有限公司 Sound-groove gate inhibition system and uses thereof
CN102148899A (en) * 2011-03-29 2011-08-10 广东欧珀移动通信有限公司 Mobile phone acoustic-control unlocking method
CN102324232A (en) * 2011-09-12 2012-01-18 辽宁工业大学 Method for recognizing sound-groove and system based on gauss hybrid models

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《基于GMM的说话人识别技术研究与实现》;胡益平;《CNKI优秀硕士学位论文全文库》;20080630;全文 *
胡益平.《基于GMM的说话人识别技术研究与实现》.《CNKI优秀硕士学位论文全文库》.2008,全文.

Also Published As

Publication number Publication date
CN102647521A (en) 2012-08-22

Similar Documents

Publication Publication Date Title
CN102647521B (en) Method for removing lock of mobile phone screen based on short voice command and voice-print technology
CN108597496B (en) Voice generation method and device based on generation type countermeasure network
Sharma et al. Empirical mode decomposition for adaptive AM-FM analysis of speech: A review
CN103065631B (en) A kind of method of speech recognition, device
CN101197131B (en) Accidental vocal print password validation system, accidental vocal print cipher lock and its generation method
Li et al. Robust speaker identification using an auditory-based feature
CN101847412B (en) Method and device for classifying audio signals
CN103971680A (en) Method and device for recognizing voices
CN109215665A (en) A kind of method for recognizing sound-groove based on 3D convolutional neural networks
CA2076072A1 (en) Auditory model for parametrization of speech
CN103310788A (en) Voice information identification method and system
CN102664010B (en) Robust speaker distinguishing method based on multifactor frequency displacement invariant feature
CN111696580B (en) Voice detection method and device, electronic equipment and storage medium
CN111603776B (en) Method for identifying gunshot in audio data, motor driving method and related device
CN113823293B (en) Speaker recognition method and system based on voice enhancement
CN108305639A (en) Speech-emotion recognition method, computer readable storage medium, terminal
CN110570871A (en) TristouNet-based voiceprint recognition method, device and equipment
Zhang et al. Voice biometric identity authentication system based on android smart phone
CN104364845A (en) Processing apparatus, processing method, program, computer readable information recording medium and processing system
CN115911623A (en) Battery thermal runaway diagnosis method and system of energy storage system based on acoustic signals
Deng et al. On the importance of different frequency bins for speaker verification
Bahaghighat et al. Textdependent Speaker Recognition by combination of LBG VQ and DTW for persian language
CN103280219A (en) Android platform-based voiceprint recognition method
Hundal et al. Some feature extraction techniques for voice based authentication system
CN108538309B (en) Singing voice detection method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20160106

Address after: 100000, No. two, building 17, Zhongguancun Software Park, 8 northeast Wang Xi Road, Beijing, Haidian District, A2

Patentee after: BAIDU.COM TIMES TECHNOLOGY (BEIJING) Co.,Ltd.

Address before: 350000, 403A building, four floor, Torch Innovation Building, 8 star road, Fuzhou Development Zone, Fuzhou, Fujian, China

Patentee before: Fuzhou Boyuan Wireless Network Technology Co., Ltd.