CN101046959A

CN101046959A - Identity identification method based on lid speech characteristic

Info

Publication number: CN101046959A
Application number: CNA2007100400038A
Authority: CN
Inventors: 王士林; 刘功申; 林祥; 李翔; 李生红
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2007-04-26
Filing date: 2007-04-26
Publication date: 2007-10-03

Abstract

The present invention relates to a speechmaker identity authentication method based on lip movement in the field of biological characteristics and mode identification technology. Said method includes the following steps: inputting lip image sequence, dividing lip region based on improved fuzzy aggregation algorithm, modeling lip region based on point drive and extracting various lip characteristics based on lip region model.

Description

Identity identifying method based on lid speech characteristic

Technical field

What the present invention relates to is a kind of method of field of information security technology, particularly a kind of identity identifying method based on lid speech characteristic.

Background technology

Along with the arriving of information society, the importance of security of system is day by day remarkable.Identity identifying technology is the prerequisite of most of security of system accurately.Traditional identity identifying method is often based on special article (as key, I.D., smart card or the like) or based on specific knowledge (as password, ID (identity number) card No. or the like).There is the problem of easy loss or damage in the former, and the latter then exists password to pass into silence easily or problem such as be cracked.The more important thing is that because the authentication information of these identity can duplicate and share often, system can't judge user's real identity by these information.Therefore, when using these traditional identity recognition methodss, above-mentioned limitation brings hidden danger can for inevitably the safety of system.Yet,, but be expected to overcome the above problems well if in identity authorization system, introduce biological information.

As everyone knows, everyone has the speech style of self uniqueness, even when telling about identical language, each one mode of telling about also is not quite similar.These each one exclusive locutions are mainly reflected in: one, acoustic information aspect---different separately sound spectrum; Two, visual information aspect---different separately lip motion process.Research shows that also lid speech characteristic is the same with features such as fingerprint, looks, voice, has certain identity distinguishing ability, can be regarded as a kind of new bio feature in identification field.

Carry out aspect the authentication using lid speech characteristic, more influential in the world achievement in research has: people such as the Luettin of electronic engineering of Britain Sheffield university propose with lip shape and oral cavity interior intensity distributed intelligence as feature, and use hidden Markov model (Hidden Markov Model, HMM) identifying algorithm of differentiating, this method can reach 92% discrimination nearly in the experiment of carrying out for 12 users; The identity identifying method that the people such as Wark of Queensland ,Australia University of Science and Technology propose based on the lip feature extraction, wherein use principal component analysis (PCA) (PCA) and linear discriminant analysis (LDA) to extract the lip feature, and used gauss hybrid models (GMM) to finish identification and certification work.This method can reach 90% discrimination nearly in the experiment of carrying out for 37 users.Above-mentioned technology has reflected the feasibility of using lip motion to carry out identification to a certain extent, for the research of this respect is from now on had laid a good foundation.

Find through literature search prior art, Kanak etc. are at 2003 international voice sound signal processing conferences (Proceedings of IEEE International Conference on Acoustics, Speech and SignalProcessing, vol.3, " uniting the speaker identity authentication techniques of looking Audio Processing " of delivering pp.561-564) (Joint audio-video processing for biometric speaker identification), proposing a kind of employing intrinsic lip (eigenlip) in this article is feature, and the individual layer hidden Markov model is the speaker identity authentication method based on lid speech characteristic of classification authentication mechanism.Its deficiency is: 1) the extraction accuracy deficiency of lid speech characteristic; 2) lacking the lid speech characteristic that the identity resolving ability is arranged expresses; 3) lack the ID authentication mechanism that is fit to lid speech characteristic.

Summary of the invention

The objective of the invention is to overcome deficiency of the prior art, propose a kind of identity identifying method, achieve identification, judge the consistance of itself and legal identity for speaker identity based on lid speech characteristic, when obtaining high discrimination, guaranteed higher processing speed.

The present invention is achieved by the following technical solutions: at first carry out the automatic extraction of lid speech characteristic by lip region dividing method and lip modeling method, carry out the analysis of various lid speech characteristic identity resolving abilitys by the method for mutual information assessment then, from various lid speech characteristics, choose optimum lid speech characteristic combination, at last, according to the characteristics of lid speech characteristic, adopt multi-level hidden Markov model to carry out the identification and the authentication of speaker identity.

Described lip region dividing method specifically is a kind of improved fuzzy Classified Algorithms Applied, and its objective function is:

J_{m} = J_{CLR} + J_{SPA} = J_{CLR, OBJ} + J_{CLR, BKG} + J_{SPA, OBJ} + J_{SPA, BKG}

= Σ_{r = 1}^{N} Σ_{s = 1}^{M} u_{0, r, s}^{m} d_{0, r, s}^{2} + Σ_{r = 1}^{N} Σ_{s = 1}^{M} u_{1, r, s}^{m} d_{1, r, s}^{2} + Σ_{r = 1}^{N} Σ_{s = 1}^{M} u_{0, r, s}^{m} g_{OBJ} (r, s) + Σ_{r = 1}^{N} Σ_{s = 1}^{M} u_{1, r, s}^{m} g_{BKG} (r, s)

And membership function is obeyed

Σ_{i = 0}^{1} u_{i, r, s} = 1 .

Wherein, the subscript CLR of objective function J, SPA, OBJ and BKG represent color part, space segment, lip target part and the background parts of objective function respectively; u _{I, r, s}Represent coordinate for (r, pixel s) is for the degree of membership of i class; d _{I, r, s}Representing coordinate is (r, the color distance between pixel s) and the i class color center; g _OBJAnd g _BKGDenotation coordination is (r, the space length between pixel s) and the center, i space-like respectively.Can obtain the degree of membership of all pixels through iteration optimization, promptly be considered to the lip pixel greater than 0.5 picture element for the target class degree of membership about the lip class.

Described lip region modeling method is meant: a lip geometric model of forming by three quafric curves in 16 o'clock, and in the process of model optimization, the iteration optimization process of " point drives " of employing.In each iterative loop, optimizing process is only adjusted 16 model silhouette points and is made and wherein comprise lip pixel as much as possible and the least possible non-lip pixel.

Described method by the mutual information assessment is carried out the analysis of various lid speech characteristic identity resolving abilitys, adopts following formula to realize:

I (X; Y) = \underset{XY}{Σ} P (xy) \log_{2} \frac{P (x | y)}{P (x)},

I (X wherein; Y) represent feature set X and teller to gather mutual information between the Y, P (xy), P (x|y) and P (x) represent the prior probability of joint probability, conditional probability and the x of x and y respectively.Because often take the mutual information of discrete form to express (promptly calculating discrete probability by histogrammic form) in signature analysis, determining of histogrammic quantization method is an important ring of impact analysis accuracy.The present invention adopts the histogram of uniform quantization to calculate discrete probability, and wherein histogrammic dimension calculates by the following method: histogrammic dimension calculates by the following method: log ₂N+1 (if feature Gaussian distribution), otherwise equal

N is the training sample sum.

Described characteristics according to lid speech characteristic adopt multi-level hidden Markov model to carry out the identification and the authentication of speaker identity, and concrete steps are as follows:

1. the basic exercise sequence is cut apart: the basic exercise sequence of so-called lip refers to that two ends change the violent lip motion segment of smooth intermediate change.Note i characteristics combination constantly is f _{Lip, i}, then i+1 characteristics combination constantly is f _{Lip, i+1}If, Δ=‖ f _{Lip, i+1}-f _{Lip, i}‖＞T (T is default thresholding), then time i is considered to a lip mutation time point, just the starting point of basic exercise sequence (or terminating point).After having determined all lip mutation time points, whole signal language sequence can be divided into several basic exercise sequences.

2. customer group hidden Markov model (Hidden Markov Model, HMM) foundation in storehouse: the basic exercise sequence of telling about certain segment in the signal language with all users in the training set is a training sample, the optimal characteristics collection is a training characteristics, HMM is a disaggregated model continuously, sets up the HMM model bank of customer group about all basic exercise sequences in the signal language; Wherein, the HMM model parameter is set to: 6 states are model from left to right, and Bao Mu-Wei Erqi (Baum-Welch) algorithm is a training algorithm, and Viterbi (Viterbi) algorithm is a recognizer;

3. specific teller's " lip print model " foundation:, analyze the matching degree of this segment in specific teller and the customer group HMM storehouse for all basic exercise sequences in the signal language.If matching degree higher (greater than predefined threshold value) illustrates that the user tells about this segment and do not have uniqueness.Otherwise, illustrating that the user tells about segment and has suitable characteristics, the basic exercise sequence of telling about this segment with this user is taked and the identical model parameter in customer group HMM storehouse as training set, sets up " lip print model " with specific teller's characteristics.

4. teller's identification and authentication: for teller's identification, promptly from customer group, discern teller's identity, the present invention adopts the matching degree of all users " lip print model " in analytical test speech segment and the sample storehouse, therefrom choose matching degree the highest be judged as the teller.For teller's authentication, the present invention is mated by analytical test speech segment and specific teller's " lip print model ", passes through if matching degree surpasses threshold value then authenticates, otherwise then regards as the disabled user.

The technical solution used in the present invention is better than traditional identity identifying method based on lid speech characteristic on performance, its reason is: first, the automatic extractive technique of lid speech characteristic in the scheme can obtain lip region information more accurately by taking into account color and space distribution information; The second, can obtain more to have the characteristics combination of identity resolving ability by the method for mutual information assessment; The 3rd, the framework of multi-level hidden Markov model meets the characteristics of lid speech characteristic, and higher classification accuracy can be provided.

The present invention is directed to the concrete characteristics of lip image sequence, proposed to be fit to extraction, analysis and the sorting technique of such biological characteristic, and realized the authentication of speaker identity.The present invention has passed through performance test, and table with test results understands that the present invention has higher accuracy for speaker identity identification and authentication, i.e. identification rate is 98.07%, and error rates such as authentication are 2.31%.Simultaneously, the present invention has guaranteed higher processing speed when obtaining high discrimination, have broad application prospects.

Description of drawings

Fig. 1 the inventive method process flow diagram

Wherein: (a) training flow process; (b) identifying procedure

The extraction method process flow diagram of Fig. 2 lid speech characteristic of the present invention

Fig. 3 lip segmentation result synoptic diagram

Wherein: (a) the former figure of lip; (b) lip segmentation result

Fig. 4 lip geometric model synoptic diagram

Fig. 5 the present invention is fit to the identity identifying and authenticating method flow diagram of lid speech characteristic characteristics

Wherein: (a) training flow process; (b) identifying procedure

Embodiment

Below in conjunction with accompanying drawing embodiments of the invention are elaborated: present embodiment is being to implement under the prerequisite with the technical solution of the present invention, provided detailed embodiment and concrete operating process, but protection scope of the present invention is not limited to following embodiment.

As shown in Figure 1, present embodiment at first carries out the automatic extraction of lid speech characteristic, mainly comprises: the 1) geometric properties of lip, the length and the width information of lip region are promptly described; 2) contour feature of lip is promptly described the information of lip contour shape; 3) lip region internal feature is promptly described information such as tooth that lip inside may comprise, tongue; 4) spectrum signature of lip is promptly described the feature of lip motion at frequency domain.Then by (Mutual Information, method MI) are made accurate, deep assessment for the identity resolving ability that above-mentioned each category feature has based on mutual information.At last, finish identification and authentication by multi-level hidden Markov model for speaker identity.

As shown in Figure 2, the automatic extraction flow process of lid speech characteristic.It mainly comprises following three parts: lip region is cut apart, lip region modeling and various lid speech characteristic extract.Each part concrete steps is as follows:

● lip region is cut apart:

Owing to the color distance of rgb space and the lack of uniformity of human eye vision, therefore at first the RGB color space is converted into the CIE-LABUV color space of relative equilibrium, so the colouring information of each pixel can be expressed as { L, a, b, u, v} in the image I.Then, whole lip region cutting procedure can be described as two class classification problems for all pixels in the image, promptly is divided into lip pixel class and non-lip pixel class.At last, take into account color and spatial information, the inventor adopts and improves the classification that fuzzy Classified Algorithms Applied is finished pixel.Concrete objective function is:

J_{m} = J_{CLR} + J_{SPA} = J_{CLR, OBJ} + J_{CLR, BKG} + J_{SPA, OBJ} + J_{SPA, BKG}

= Σ_{r = 1}^{N} Σ_{s = 1}^{M} u_{0, r, s}^{m} d_{0, r, s}^{2} + Σ_{r = 1}^{N} Σ_{s = 1}^{M} u_{1, r, s}^{m} d_{1, r, s}^{2} + Σ_{r = 1}^{N} Σ_{s = 1}^{M} u_{0, r, s}^{m} g_{OBJ} (r, s) + Σ_{r = 1}^{N} Σ_{s = 1}^{M} u_{1, r, s}^{m} g_{BKG} (r, s)

And membership function is obeyed

Σ_{i = 0}^{1} u_{i, r, s} = 1 .

Wherein, the subscript CLR of objective function J, SPA, OBJ and BKG represent color part, space segment, lip target part and the background parts of objective function respectively; u _{I, r, s}Represent coordinate for (r, pixel s) is for the degree of membership of i class; d _{I, r, s}Representing coordinate is (r, the color distance between pixel s) and the i class color center; g _OBJAnd g _BKGDenotation coordination is (r, the space length between pixel s) and the center, i space-like respectively.

Can obtain the degree of membership of all pixels through iteration optimization (iterative optimization), promptly be considered to the lip pixel greater than 0.5 picture element for the target class degree of membership about the lip class.The lip segmentation result as shown in Figure 3.

● the lip region modeling:

If adopt the pixel mode to describe lip region, have the too high deficiency of intrinsic dimensionality.So need to adopt lip model description lip pixel, thereby reduce dimension, classification effectiveness and accuracy rate after improving.The present invention adopts 16 lip geometric models (as shown in Figure 4), and this model is made up of three quafric curves, has both reduced the low complex degree of model, guarantees the rationality of lip model again.

In the process of model optimization, the iteration optimization process of " point drives " that the present invention adopts.In each iterative loop, optimizing process is only adjusted 16 model silhouette points and is made and wherein comprise lip pixel as much as possible and the least possible non-lip pixel.

● various lid speech characteristics extract:

From the lip model, can obtain following feature easily: lip geometric properties (f _Geo); Lip shape feature (f _Shape, comprise the dynamic shape aspect of model etc.); Lip textural characteristics (f _Texture, comprise the dynamic appearance aspect of model etc.); Lip internal feature (f _Inner, comprise tooth, tongue feature etc.); Lip spectrum signature (f _Freq, comprise various frequency domain characters).

In view of the identity distinguishing ability of various lip features can obtain with different tellers' mutual information by calculating this feature, the present invention adopts the mutual information analytical approach to analyze the identity resolving ability of various features.The mutual information computing formula is as follows:

I (X; Y) = \underset{XY}{Σ} P (xy) \log_{2} \frac{P (x | y)}{P (x)} .

Because often take the mutual information of discrete form to express (promptly calculating discrete probability by histogrammic form) in signature analysis, determining of histogrammic quantization method is an important ring of impact analysis accuracy.The present invention adopts the histogram of uniform quantization to calculate discrete probability, and wherein histogrammic dimension calculates by the following method: log ₂N+1 (if feature Gaussian distribution), otherwise equal

N is the training sample sum.

By the mutual information analysis, the inventor chooses following feature construction optimal characteristics combination, owing to wherein comprised the abundantest identity resolving ability: { f _Geo, f _Inner, f _Texture, i _Ca(f wherein _{Texture, ica}The lip textural characteristics that expression obtains by independent component analysis (ICA)).

As shown in Figure 5, be fit to the speaker identity identification authentication method based on multi-level hidden Markov model of lid speech characteristic characteristics, specific as follows:

1. the basic exercise sequence is cut apart: the basic exercise sequence of so-called lip refers to that two ends change the violent lip motion segment of smooth intermediate change.Note i characteristics combination constantly is f _{Lip, i}, then i+1 characteristics combination constantly is f _{Lip, i+1}If, Δ=‖ f _{Lip, i+1}-f _{Lip, i}‖＞T, then time i is considered to a lip mutation time point, the just starting point of basic exercise sequence (or terminating point).After having determined all lip mutation time points, whole signal language sequence can be divided into several basic exercise sequences.

2. customer group hidden Markov model (Hidden Markov Model, HMM) foundation in storehouse: the basic exercise sequence of telling about certain segment in the signal language with all users in the training set is a training sample, the optimal characteristics collection is a training characteristics, HMM is a disaggregated model continuously, sets up the HMM model bank of customer group about all basic exercise sequences in the signal language; Wherein, the HMM model parameter is set to: 6 states are model from left to right, and Bao Mu-Wei Erqi (Baum-Welch) algorithm is a training algorithm, and Viterbi (Viterbi) algorithm is a recognizer.

In order to implement concrete grammar of the present invention, the inventor has designed and Implemented the identity authorization system based on lid speech characteristic.By for 40 tellers (29 male 11 woman), tell about the experiment and the assessment in the lip image sequence storehouse (comprise about 36,000 open one's mouth lip image) of particular hint language.Native system has higher accuracy for the authentication of speaker identity, i.e. identification rate is 98.07%, and error rates such as authentication are 2.31%.

Claims

1, a kind of identity identifying method based on lid speech characteristic, it is characterized in that, at first carry out the automatic extraction of lid speech characteristic by lip region dividing method and lip modeling method, carry out the analysis of various lid speech characteristic identity resolving abilitys by the method for mutual information assessment then, from various lid speech characteristics, choose optimum lid speech characteristic combination, at last, according to the characteristics of lid speech characteristic, adopt multi-level hidden Markov model to carry out the identification and the authentication of speaker identity.

2, the identity identifying method based on lid speech characteristic according to claim 1 is characterized in that, described lip region dividing method, and its objective function is:

J_{m} = J_{CLR} + J_{SPA} = J_{CLR, OBJ} + J_{CLR, BKG} + J_{SPA, OBJ} + J_{SPA, BKG}

= Σ_{r = 1}^{N} Σ_{s = 1}^{M} u_{0, r, s}^{m} d_{0, r, s}^{2} + Σ_{r = 1}^{N} Σ_{s = 1}^{M} u_{1, r, s}^{m} d_{1, r, s}^{2} + Σ_{r = 1}^{N} Σ_{s = 1}^{M} u_{0, r, s}^{m} g_{OBJ} (r, s) + Σ_{r = 1}^{N} Σ_{s = 1}^{M} u_{1, r, s}^{m} g_{BKG} (r, s)

And membership function is obeyed

Σ_{i = 1}^{1} u_{i, r, s} = 1,

Wherein, the subscript CLR of objective function J, SPA, OBJ and BKG represent color part, space segment, lip target part and the background parts of objective function respectively; u _{I, r, s}Represent coordinate for (r, pixel s) is for the degree of membership of i class; d _{I, r, s}Representing coordinate is (r, the color distance between pixel s) and the i class color center; g _OBJAnd g _BKGDenotation coordination is (r, the space length between pixel s) and the center, i space-like respectively; Obtain the degree of membership of all pixels through iteration optimization, promptly be considered to the lip pixel greater than 0.5 picture element for the target class degree of membership about the lip class.

3, the identity identifying method based on lid speech characteristic according to claim 1, it is characterized in that, described lip region modeling method, be meant: a lip geometric model of forming by three quafric curves in 16 o'clock, in the process of model optimization, the iteration optimization process of " point drives " of adopting, in each iterative loop, optimizing process is only adjusted 16 model silhouette points and is made and wherein comprise lip pixel as much as possible and the least possible non-lip pixel.

4, the identity identifying method based on lid speech characteristic according to claim 1 is characterized in that, described lid speech characteristic extracts automatically, extracts following feature: the lip geometric properties; The lip shape feature that comprises the dynamic shape aspect of model; The lip textural characteristics that comprises the dynamic appearance aspect of model; The lip internal feature that comprises tooth, tongue feature; Various lip spectrum signatures.

5, the identity identifying method based on lid speech characteristic according to claim 1 is characterized in that, described method by the mutual information assessment is carried out the analysis of various lid speech characteristic identity resolving abilitys, adopts following formula to realize:

I (X; Y) = \underset{XY}{Σ} P (xy) \log_{2} \frac{P (x | y)}{P (x)},

I (X wherein; Y) represent feature set X and teller to gather mutual information between the Y, P (xy), P (x|y) and P (x) represent the prior probability of joint probability, conditional probability and the x of x and y respectively; Adopt the histogram of uniform quantization to calculate discrete probability, wherein histogrammic dimension calculates by the following method: if be characterized as Gaussian distribution, then dimension is log ₂N+1, otherwise equal

\log_{2} N + 1 + \log_{2} (1 + κ \cdot \sqrt{\frac{N}{6}}),

N is the training sample sum.

6, the identity identifying method based on lid speech characteristic according to claim 1 is characterized in that, described characteristics according to lid speech characteristic adopt multi-level hidden Markov model to carry out the identification and the authentication of speaker identity, and concrete steps are as follows:

1. the basic exercise sequence is cut apart: the basic exercise sequence of so-called lip, refer to that two ends change the violent lip motion segment of smooth intermediate change, and note i characteristics combination constantly is f _{Lip, i}, then i+1 characteristics combination constantly is f _{Lip, i+1}If, Δ=‖ f _{Lip, i+1}-f _{Lip, i}‖＞T, then time i is considered to a lip mutation time point, i.e. the starting point of basic exercise sequence or terminating point, after having determined all lip mutation time points, whole signal language sequence can be divided into several basic exercise sequences;

2. the foundation in customer group hidden Markov model storehouse: the basic exercise sequence of telling about certain segment in the signal language with all users in the training set is a training sample, the optimal characteristics collection is a training characteristics, hidden Markov model is a disaggregated model continuously, sets up the hidden Markov model storehouse of customer group about all basic exercise sequences in the signal language; Wherein, the hidden Markov model parameter is set to: 6 states are model from left to right, and Bao Mu-Wei Erqi algorithm is a training algorithm, and viterbi algorithm is a recognizer;

3. specific teller's " lip print model " foundation: for all basic exercise sequences in the signal language, analyze the matching degree of this segment in specific teller and the customer group hidden Markov model storehouse, if matching degree is greater than predefined threshold value, illustrate that the user tells about this segment and do not have uniqueness, otherwise, illustrate that the user tells about segment and has characteristics, tell about the basic exercise sequence of this segment as training set with this user, take and the identical model parameter in customer group hidden Markov model storehouse, set up " lip print model " with specific teller's characteristics;

4. teller's identification and authentication: for teller's identification, adopt the matching degree of all users " lip print model " in analytical test speech segment and the sample storehouse, therefrom choose matching degree the highest be judged as the teller; For teller's authentication, " lip print model " by analytical test speech segment and specific teller mates, and passes through if matching degree surpasses threshold value then authenticates, otherwise then regards as the disabled user.