CN103280219A

CN103280219A - Android platform-based voiceprint recognition method

Info

Publication number: CN103280219A
Application number: CN2013101828255A
Authority: CN
Inventors: 刘海亮; 蒋德东; 曹彩凤
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2013-05-16
Filing date: 2013-05-16
Publication date: 2013-09-04

Abstract

The invention discloses an android platform-based voiceprint recognition method, which comprises the following steps: inputting a voice password by a user through a mobile phone; acquiring voiceprint characteristic parameters by using a Mel-cepstrum technology; inputting the same password when the user logs in the system for the second time; meanwhile, acquiring voiceprint information to be recognized through characteristic extraction; finally comparing the obtained confidence coefficient with a critical value, getting the output result of being me when the confidence coefficient is less than the critical value, and otherwise getting the result of being not me so as to finish a certification process. By implementing the method, a voiceprint recognition system is provided. According to the system, the operation is more convenient and quicker and the actual requirements of uses can be met better.

Description

A kind of voiceprint authentication method based on the Android platform

Technical field

The present invention relates to technique computes machine field, be specifically related to a kind of voiceprint authentication method based on the Android platform.

Background technology

Along with the development of intelligent mobile phone system, the application and development of smart mobile phone has become the focus that people pay close attention to.Along with Android mobile phone fashionable in the whole world, will certainly drive increasing sharply of Android application software quantity.The application software of Android mobile phone has related to fields of society at present.The exploitation of Android mobile phone aspect voice now begun, and will provide better to the user, serve more efficiently.Therefore, the vocal print Study on Technology is bound to become the Android mobile phone in the future in important research direction of voice development field.The voiceprint groundwork is the characteristic information that obtains in speaker's the voice, and these information can comprise speaker's psychologic status, behavior disposition etc.By analyzing the real identity that these information can confirm the speaker.Therefore, need carry out feature extraction to speaker's voice, obtain characteristic parameter.The characteristic parameter that speaker's front and back voice are obtained carries out then, and then judges speaker's true identity.It substantially all is to adopt computer as operation platform that traditional vocal print technology is used.Because computer has shortcomings such as the inconvenience of carrying, and has brought limitation to application.And will adopt smart mobile phone as carrier based on the voiceprint system of Android platform, and use conveniently like this, the user only need just can realize some authentication operations by voice, has saved the time and operation is more convenient.

Linear prediction cepstrum coefficient (LPCC) is the characteristic parameter extraction method that generally adopts at present.Usually be applied in conjunction and the continuous speech recognition, usually research has adopted three kinds of main linear predictions to deduce parameter, be linear prediction reflection coefficient, line spectrum pair coefficient and linear prediction cepstrum coefficient, and the application in the conjunction speech recognition system, and carry out Computer Simulation.

Artificial neural network algorithm is exactly the second way of anthropomorphic dummy's thinking.This is a nonlinear kinetics system, and its characteristic is distributed storage and the concurrent collaborative processing of information.Though single neuronic structure is extremely simple, function is limited, and the behavior that the network system that a large amount of neurons constitute can realize is extremely colourful.Therefore, this ability to model to the human cognitive system of neural network has determined to be applied to speech processes the incomparable advantage of additive method.

Present voiceprint mainly with computing machine as the operation carrier, most of linear prediction cepstrum coefficient (LPCC) method that adopts of vocal print feature extraction obtains characteristic parameter, this Feature Extraction Technology obtains parameter and is not suitable for voiceprint usually.The vocal print parameter matching adopts artificial neural network algorithm and Hidden Markov algorithm etc. mostly.These algorithm more complicated may influence the total system performance if operate on the intelligent mobile phone platform.

The major defect of existing voiceprint has: one, authentication rate is generally on the low side, and needing repeatedly usually, input could realize authentication.Two, running environment is had relatively high expectations, major part all requires just can normally move on computers.Software and hardware to computing machine all has requirement.Three, authentication result very easily is subjected to the interference of external environment, a little less than the anti-interference.Extraneous slight noise may make the result change.This also is the difficult problem that present vocal print Verification System generally runs into.

Summary of the invention

Purpose of the present invention provides the voiceprint system based on the Android platform, by the improvement to the voiceprint matching algorithm, has reduced the matching algorithm time complexity, has improved system running speed, makes system's operation more quick.

The invention provides a kind of voiceprint authentication method based on the Android platform, comprise the steps:

The user is by mobile phone input voice password; Adopt Mel cepstrum technology to obtain the vocal print characteristic parameter; When the user lands this system for the second time, the input same challenge, obtain voice voiceprint to be certified by feature extraction simultaneously, at last, compare according to the degree of confidence score and the critical value that obtain, the output result be " in person " when less than critical value, otherwise the gained result is " in person non-", thereby finishes verification process.

Described Mel cepstrum technology comprises:

Voice signal carries out Fast Fourier Transform (FFT) (FFT), and voice signal is transformed into frequency domain from time domain;

Before carrying out the V-belt pass filter, common frequencies to be converted into the Mel frequency usually;

The centering frequency,

f (m) = (\frac{N}{F_{s}}) {F_{mel}}^{- 1} [f_{l} + m \frac{F_{mel} (f_{h}) - F_{mel} (f_{l})}{M + 1}],

M=0 wherein, 1,2 ... 24; f _h, f _lBe respectively the highest and low-limit frequency of wave filter range of application; The width of representing FFT with N; F _sBe sample frequency (F _s=8000Hz); F _Mel ^-1=700 (e ^H/1125-1) is F _MelInverse function;

The H that wave filter is asked _m(k) calculate spectrum energy Ek;

Cosine transform is carried out cosine transform with top gained logarithm energy by following formula, and L gets 12 in the formula, is 12 one digit number group Mel cepstrum parameter just thereby obtain length.

Describedly obtain voice voiceprint to be certified by feature extraction and comprise:

The voice password that the speaker is recorded carries out feature extraction

The user imports login voice password, obtains corresponding characteristic parameter by feature extraction equally;

Two groups of argument sequences that obtain are previously mated by the voiceprint matching algorithm.

By method provided by the invention, the present invention provides a kind of voiceprint system on based on the Android smart mobile phone, can make more convenient to operately, quick by this system, can satisfy user's actual demand better.

Description of drawings

In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 is the voiceprint authentication method process flow diagram based on the android platform in the embodiment of the invention;

Fig. 2 is the Mel cepstrum techniqueflow chart in the embodiment of the invention;

Fig. 3 is frequency in the embodiment of the invention and the graph of relation of Mel frequency;

Fig. 4 is the Mel cepstrum coefficient synoptic diagram in the embodiment of the invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making all other embodiment that obtain under the creative work prerequisite.

The invention provides a kind of voiceprint system based on the Android platform, different with voiceprint system in the past is, this method with smart mobile phone as operation platform, on dynamical Matching Algorithm, improve simultaneously, reduced the time complexity of algorithm, made system be adapted at more moving above the smart mobile phone.

The operational flow diagram of system is as shown below.At first, the user is by mobile phone input voice password.Then, by feature extraction algorithm speech characteristic parameter is extracted.Native system adopts Mel cepstrum technology to obtain the vocal print characteristic parameter.When the user lands this system for the second time, the input same challenge, obtain voice voiceprint to be certified by feature extraction simultaneously, at last, compare according to the degree of confidence score and the critical value that obtain, the output result be " in person " when less than critical value, otherwise the gained result is " in person non-", thereby finishes verification process.Critical value herein is to obtain by testing repeatedly of back plane system.

The present invention mainly contains two tasks, and one is the extraction of vocal print characteristic parameter; Another is vocal print characteristic parameter coupling.Mel cepstrum technology is adopted in vocal print feature extraction of the present invention.The groundwork of this part is the parameter information that obtains can embody in the voice signal speaker's voice uniqueness.The feature of voiceprint must be the exclusive characteristic of speaker, also be and other people key distinction, and this feature remains unchanged for the speaker, tends towards stability, and can be described as the build-in attribute of people's voice.The concrete flow process of this technology as shown in Figure 2.

The first step: voice signal carries out Fast Fourier Transform (FFT) (FFT), and voice signal is transformed into frequency domain from time domain.

Second step: before carrying out the V-belt pass filter, common frequencies will be converted into the Mel frequency usually.Change by following formula.Mel (f)=2595*log10 (1+f/700) or mel (f)=1125*ln (1+f/700).

The 3rd step: the centering frequency,

f (m) = (\frac{N}{F_{s}}) {F_{mel}}^{- 1} [f_{l} + m \frac{F_{mel} (f_{h}) - F_{mel} (f_{l})}{M + 1}],

M=0 wherein, 1,2 ... 24; f _h, f _lBe respectively the highest and low-limit frequency of wave filter range of application; The width (N gets 512 herein) of representing FFT with N; F _sBe sample frequency (F _s=8000Hz); F _Mel ^-1=700 (e ^H/1125-1) is F _MelInverse function.

The centre frequency f (m) that obtains is brought into carries out computing in the top wave filter.This wave filter is to be made of 24 V-belt bandpass filters.Filter function is as follows:

H_{m} (k) = \{\begin{matrix} 0, k \leq f (m - 1), \\ \frac{k - f (m - 1)}{f (m) - f (m - 1)}, f (m - 1) \leq k \leq f (m), \\ \frac{f (m + 1) - k}{f (m + 1) - f (m)}, f (m) \leq k \leq f (m + 1), \\ 0, k &GreaterEqual; f (m + 1), \end{matrix}

The 4th step: the H that wave filter is asked _m(k) calculate spectrum energy Ek.Computing formula is as follows:

Ek = \ln (Σ_{k}^{M - 1} {| X (k) |}^{2} H_{m} (k)), (0 \leq k \leq M)

Wherein X (k) is tried to achieve frequency spectrum numerical value by the front Fast Fourier Transform (FFT).M represents the actual quantity of Mel frequency filter.The value of M is 24 in this problem.

The 5th step: cosine transform, top gained logarithm energy is carried out cosine transform by following formula, L gets 12 in the formula, is 12 one digit number group Mel cepstrum parameter just thereby obtain length.Formula is as follows.

C_{m} = Σ_{k = 0}^{L - 1} \cos (m * (k + 0.5) * π / M) * Ek, m = 1,2,3 . . ., L

M in the formula represents to be with logical triangular filter number.Adopting the DCT conversion is that expectation can be carried out cepstrum.Finally obtain size and be 12 one-dimension array parameter.Gained Mel cepstrum coefficient as shown in Figure 4.

, by calling the SharedPreferences data storage method data are stored for the vocal print parameter that obtains.Key code in the system is as follows:

SharedPreferences settings=getSharedPreferences (PREFS_NAME, 0); // create object to store data

SharedPreferences.Editor editor=settings.edit (); // obtain the Editor object

SharedPreferences settings=getSharedPreferences (PREFS_NAME, 0); // obtain data

String ban=settings.getString (" moban ", " err "); // obtain template parameter

SharedPreferences?settings1=getSharedPreferences(PREFS_NAME,1)；

String biao=settings1.getString (" mubiao ", " ron "); // obtain target component

SharedPreferences object and SQLite database are compared, and operate simplyr, have saved loaded down with trivial details operations such as establishment table.

Vocal print characteristic parameter coupling will be compared to two groups of voiceprint parameters that obtain, and degree of distortion and the degree of confidence score obtained between them compare.

The first step: the voice password that the speaker is recorded carries out feature extraction.The parameter of feature extraction gained is stored in the SD card as Template Information.This argument sequence can be expressed as R={R (1) with mathematic(al) representation, R (2) ... R (n) ..., R (M) }, the n in the formula represents the sequence number of speech frame, wherein n=1 represents that voice extract beginning, represents during n=M that leaching process finishes, thus speech frame add up to M.

Second step: the user imports login voice password, obtains corresponding characteristic parameter by feature extraction equally.Be T={T (1) with the mathematic(al) representation of the argument sequence that obtains, T (2) ... T (m) ..., T (L) }, m represents the sequence number of speech frame in the formula, the eigenvector of T (m) expression m frame.

The 3rd step: two groups of argument sequences that obtain are previously mated by the voiceprint matching algorithm.If the array length of the voiceprint of input voice and the array length of the resulting voice messaging of voice to be certified are M, the minimum distortion degree D that calculates between the two need expend very big cost so, needs to calculate the matrix of M*M.For above reason, can in the M*M matrix, choose some, these points can be expressed as

T (m ₁) R (n ₁), T (m ₂) R (n ₂) ...., T (m _k) R (n _k), satisfy m in the formula ₁＜m ₂＜...＜m _k,

n ₁＜n ₂＜...＜n _k, the bee-line expression formula of every some correspondence is

D(T(T(m ₁)R(n ₁)))，D(T(T(m ₂)R(n ₂)))，…，D(T(T(m _k)R(n _k)))

Therefore, the accumulation distance that can lose vector in the hope of minimum is

D [T, R] = Σ_{l = 1}^{j} D [(T (m_{1}) R (m_{1})), (T (m_{l + 1}) R (m_{l + 1}))]

Wherein j is selected counting.From top formula, be the minimum distortion degree summation of several minor matrixs as can be seen with the matrix abbreviation of M*M, calculated amount is reduced widely, reduced the time cost of system.At last, the vocal print parameter that obtains is carried out template matches and judge speaker ' s identity.

To sum up, the present invention provides a kind of voiceprint system on based on the Android smart mobile phone, can make more convenient to operately, quick by this system, can satisfy user's actual demand better.

One of ordinary skill in the art will appreciate that all or part of step in the whole bag of tricks of above-described embodiment is to instruct relevant hardware to finish by program, this program can be stored in the computer-readable recording medium, storage medium can comprise: ROM (read-only memory) (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD etc.

More than a kind of voiceprint authentication method based on the android platform that the embodiment of the invention is provided be described in detail, used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, the part that all can change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims

1. the voiceprint authentication method based on the Android platform is characterized in that, comprises the steps:

2. the voiceprint authentication method based on the android platform as claimed in claim 1 is characterized in that, described Mel cepstrum technology comprises:

The centering frequency,

f (m) = (\frac{N}{F_{s}}) {F_{mel}}^{- 1} [f_{l} + m \frac{F_{mel} (f_{h}) - F_{mel} (f_{l})}{M + 1}],

The H that wave filter is asked _m(k) calculate spectrum energy Ek;

3. the voiceprint authentication method based on the android platform as claimed in claim 2 describedly obtains voice voiceprint to be certified by feature extraction and comprises:

The voice password that the speaker is recorded carries out feature extraction