SYSTEM FOR AUTOMATIC VERIFICATION OF BIOMETRIC DATA SIGNALS, ESPECIALLY GENERATED BY A SPEAKER
The invention concerns a method and a system for automatic verification of biometric data signals.
It applies more particularly to the automatic verification of voice acoustic signals, generated by a speaker. An application of the invention is in the creation of systems for automatic verification and authentication of a claimed or assumed identity, using data obtained from the conversion of biometric signals.
Traditionally, the biometric technologies can be divided into two broad categories: a/ the morphological biometric techniques which consist of a direct measurement of a physical and almost deterministic characteristic of the individual (fingerprints, shape of the hand, facial features, pattern of the vein system in the eye, DNA analysis). b/ the behavioural biometric techniques which consist of analysing a signal produced by an individual, and including for example:
- signature analysis (static graphy, pen speed / acceleration, pressure exerted, inclination, etc.) ; and
- speaker verification (voice characteristics);
Generally, the behavioural biometric techniques cannot achieve the same performance and reliability as the morphological biometric techniques. In particular, there are no "voice prints" comparable with the fingerprints or genetic prints. Consequently, we prefer to use the term "voice signature" to keep in mind that the voice signal, like the written signature, is not a virtually infallible means of identifying the person who produced it and that it is vulnerable to involuntary or intentional modifications (especially falsification or masking attempts).
However, several studies have demonstrated that speaker verification is a security technology well adapted to all applications requiring simplicity and transparency with respect to the user. Presently, it is one of the best compromises between security, ergonomy and quality of use. In addition, it has
become the main means of securing transactions or data exchanges on the telephone network, an application with steadily increasing demand.
For a more detailed illustration of the biometric technologies, it is worthwhile visiting, as a non-exhaustive example, the following web site: http://biometrie.online.fr.
As an illustration, and without this in any way limiting the scope of the application, in the remainder of the document we will restrict ourselves to the case of the preferred application of the invention, i.e. the automatic verification of acoustic signals generated by a speaker, a sub-assembly of what is more generally known as "automatic speaker recognition or characterisation".
The purpose of automatic speaker recognition or characterisation is to extract from the speech signal information concerning the individual who uttered it. The type of characteristics to be identified (identity, pathology, degree of tiredness, geographical origin, etc.) may vary considerably depending on the targeted application.
"Automatic Speaker Recognition" referred to as "ASR" in the remainder of the document includes a set of applications resulting from "automatic speaker characterisation" in which an attempt is made to identify a person from his voice. Its applications are mainly linked to authentication and security problems.
Traditionally, "ASR" is subdivided into various tasks:
- Automatic Speaker Identification (ASI) which consists of determining from a population of n known speakers, the person who uttered a given message
- Speaker Detection which consists of determining the presence or not of a speaker in an audio recording.
- Indexing and speaker follow-up: indexing consists of segmenting an audio document by speaker then of determining during a grouping phase the number of speakers present on a recording as well as the various time intervals during which each of these speakers is speaking; speaker follow-up consists of segmenting a speech signal to indicate when and for how long the various speakers, or target speakers, are speaking.
- Automatic Speaker Verification ("ASV") which consists of verifying the claimed or assumed identity of an individual by voice analysis. The claimed or assumed
identity, referred to in the remainder of the document as "tested identity", as well as the speech signal, referred to as "test signal" represent the two inputs of an "ASV" system in access or test phase. During an access to an "ASV" system, a measure of the "similarity" between the test signal and a reference associated with the tested identity is computed, leading to the computation of a score which is then compared with a decision threshold. The tested identity is accepted if the score is greater than this threshold and rejected otherwise.
There are numerous applications of "ASV" including, in particular, the following: a/ voice locks for building access control; b/ authentication for access to sensitive data via a telephone network (bank consultations or transactions, confidential database consultations, etc.), and c/ protection of equipment against theft (portable telephones, cars, credit cards, etc.). 61 identify certification (carrier, courier, e-mail and e-commerce, etc.).
Although it can be used for all the above tasks, the invention is more concerned by "ASV". We shall therefore describe in more detail an "ASV" system, referenced ASVS, according to the known state of the art, with reference to figure 1 attached in the appendix of this description, as well as the main phases and steps of the "ASV".
The use of an "ASV" system involves two separate phases, symbolised by dotted rectangles on figure 1 , respectively an "apprenticeship" phase ΦA and a "test" phase ΦT . The apprenticeship phase ΦA consists of constructing a reference for each user (speaker) client of the application from examples of biometric signals (e.g. speech signals). The ASVS system stores in mass memory (for example a hard disk) or a fixed or semi-fixed memory (for example read only memory of type "ROM" or re-programmable memory of type "PROM", "EEPROM" or similar), digital data forming references associated with different speakers whose identity could be verified, including a priori that corresponding to the tested identity. This storage unit is marked "library of biometric references", reference number 12, on figure 1.
In an implementation variant (not shown), the ASVS system obtains the reference of the tested speaker from an external storage unit (e.g. a server or a memory card held by the speaker himself).
The test phase ΦT (or the recognition or verification phase) consists of accepting or rejecting the claimed or assumed identity of an individual presenting himself as a client to the verification system.
In a configuration where the system computes the verification score directly using characteristic vectors, as described in the following publications:
R. BLOUET, F. BI BOT: A Tree-Based Approach for Score Computation in Speaker Verification. ISCA-ITRW Workshop: A Speaker Odyssey, Chania (Crete, Greece), June
2001. pp. 223-227.
R. BLOUET, F. BIMBOT: Tree-Based Score Computation for Speaker Verification. Eurospeech'2001 , Aalborg, September 2001 , pp. 2513-2516.
The ASVS system typically includes five main modules: an "acoustic analysis module", respectively 10 and 10', to analyse signals Y called biometric (speech) signals, implemented during the two phases
ΦA and ΦT and generating, from the digitised signal Y, a series of N vectors, y(1) to y(N), whose current element is written y(t) in a space E of characteristics representative of the said signals; a "modelling module" 11 to model the signals output from module
10, specific to the apprenticeship phase ΦA and generating for a determined identity X biometric references R (or client models) transmitted to the "library of biometric references" 12 and stored in it; the first step of the apprenticeship phase consists of partitioning, for a speaker of identity X, the space of characteristic vectors into a set π of m regions ^ forming a partition (m
possibly depending on X). Note that a partition *~'==l"r ≤k≤m 0f a space E is a set of regions which satisfy: U 1 = E and /7 f| /} = 0 (2)
/ and / being arbitrary indices between 1 and m. In other words, the union of all regions in the partition is equal to the space E and all regions are disjoint
from each other. To improve the efficiency, this partition can be made using a decision tree but any other method which can be used to produce this partitioning is also compatible with the invention.
The second step of the apprenticeship phase consists of modelling a simple score function. This score function, written Zx , is defined for each of the m regions of the partition Rx . If the score function is constant in each region, we obtain Zx{ϊk) = Sk . We may also chose a score function which is not constant, but which is easy to compute even with few computational resources.
The biometric reference x associated with the speaker X consists of the partition Rx obtained during the apprenticeship phase and the function Zx estimated during this apprenticeship phase. a characteristic "vector score computation" module 13, specific to the test phase ΦT , receiving data from module 10' as a series of characteristic vectors, y(1) to y(N), with current vector y(t), and from the library 12, as a biometric reference K corresponding to the tested identity X and directly evaluating a series of scores s(1) to s(N), with current element written s(t), for each vector y(t) in the space of characteristic vectors derived from the biometric signal. Each acoustic vector y® extracted from the biometric signal in input is allocated to one of the m regions fy of Rx , in the space of characteristic vectors. The region /> to which J tø is associated is advantageously designated via an index, written βrø, i.e. a discrete value which unambiguously designates the region to which y is allocated. For example, we could write βrø = k, i.e. the index of the region /> in the partition Rx . The individual score srø of each characteristic vector is deduced, via the index βrø, from the region rk of allocation of y® and from the score function Zx estimated during the apprenticeship phase. For example, with a constant score, if yfo is allocated to
the region rk, the score s associated with y» is s(t) = sk = Zx(ή) ■ a "global score computation" module 14, specific to the test phase ΦT and combining the series of scores s(t) to form a global score S; the s$
previously obtained are then used to compute the global score S, for example by taking the arithmetic mean:
N being the number of characteristic vectors extracted from the biometric test signal, and a "decision module" 15 specific to the test phase ΦT and generating in output a discrete signal SAR, to accept or reject the tested identity, depending on whether or not the method considers that the test signal and the biometric reference corresponding to the tested identity match. There are numerous applications for the "ASV" method in the field of security. In this field, one of the safest devices is the smartcard or, more generally, an embedded system.
In the context of the invention, the term "embedded system" must be taken in its broadest sense. It concerns in particular systems intended for a smartcard, which represents a preferred system for the implementation of the method according to the invention, but also any system intended for a portable or mobile device including specific computerised data processing means, which will be referred to below as "processing resources".
The modern embedded systems are equipped with data processing resources which can be used to carry out more and more complex and more and more numerous functions. However, although more and more advanced components and technologies are being marketed, a distinctive characteristic of embedded systems, as compared with conventional computer systems (microcomputer, workstation, etc.) concerns the limitations they impose in terms of computer resources (in particular, memory size and power of microprocessors).
The purpose of the invention is to overcome the above-mentioned disadvantages, through the use of a specific method and special organisation of the processing operations on the embedded system, guaranteeing efficient verification (especially a low rate of incorrect rejections and/or acceptances), few processing and storage operations on the embedded system and satisfactory security for the sensitive part of data used.
Another purpose of the invention is to improve the performance and efficiency of the method according to the known state of the art by decision tree and to make it less complex.
The invention will now be described in more detail, referring to the attached drawings, amongst which: figure 1 illustrates, in the form of block diagrams, the architecture of an automatic speaker verification system according to the known state of the art; figure 2 illustrates a partition resulting from multiple decision trees, according to the specific method of the invention; figure 3 illustrates an example of distributed architecture to implement the method according to the invention including a smartcard; and figure 4 illustrates the preferred distribution of the processing operations in the test phase of the method according to the invention, between a host terminal and an embedded system, in the case of distributed architecture.
We will now describe the method according to the invention, in reference to figure 2. Henceforth, without limiting in any way whatsoever the scope of the invention, we will consider the preferred application of the invention, unless otherwise specified, i.e. the context of an "ASV" system.
The partition of the space E of the method according to known state of the art described above and illustrated on figure 1 is carried out, according to the present invention, by a multiple decision tree, i.e. the use of several decision trees in parallel to define the partition of the space. The partition of the space is a multiple tree structure partition obtained by searching through several preconstructed trees. As illustrated on figure 2, with 3 trees (A-i, A2 and A3) in parallel, each tree preconstructed, according to different criteria or on different sets of data, therefore defines a different partition (Ri, R2 and R3). By different criteria we mean, for example, the criteria used to construct the tree during the apprenticeship, especially the criterion used to take the decision to create new leaves from an existing leaf (Entropie, Gini, etc.). By sets of different data, we mean for example separate sub-assemblies of apprenticeship data (but not necessarily disjoint), obtained by dividing the apprenticeship assembly
(division, jack-knife procedure or other). These trees are then used together to compute the global score by combining the scores produced by each tree.
Thus, if rit(t) is the region to which y is associated in the tree Aj, and s/,rø the corresponding score, then the global score srø is the (possibly weighted) p mean of the scores of each tree: Stø = C Sm i where p designates the number of trees used for the computation and where the terms aj represent weighting coefficients.
This method, specific to the invention, offers several advantages. Firstly, it is more economic since it can be used to produce a finer partition of the space E with fewer nodes than with a single tree, as illustrated on figure 2: p trees with q leaves (giving a total of pxq leaves) can be used to define a partition extending up to qp regions, whereas a single tree would need qp leaves to obtain the same partition complexity. It is also more efficient, since the verification performance obtained with a single tree of pxq nodes is generally much less than that obtained with p trees of q nodes, which confirms experimentally the benefit of the multiple decision tree method as compared with the method using a single decision tree.
It should also be noted that multiple partitioning can be generalised to partitioning methods other than those using a decision tree, by combining elementary partitions.
In the context of the multiple partitioning approach, the biometric reference Tx becomes a vector of biometric references [T j, the partition Rx
becomes a multiple partition [Rx J, the score function Zx becomes a set of
score functions ZάJ and the index βrø designating the region to which the
vector y belongs becomes an index vector [β/,rø].
The following description refers to an "ASV" system architecture according to this invention and as illustrated by figures 3 and 4.
Most of the operations can be carried out outside the smartcard without affecting the degree of security it offers. Only the "sensitive" data, in fact, must
remain inside the smartcard and never leave it, i.e. be communicated to the outside.
In the application targeted by the invention, the biometric references form one of the above-mentioned categories of sensitive data. The client model according to the method of the known state of the art illustrated on figure 1 or of this invention can be stored in a read only memory area of the smartcard (e.g.
"ROM" type) or a re-programmable memory (e.g. "PROM" type).
The computation of the global decision score S must also be carried out on the smartcard. The intermediate results of these computations form a second category of "sensitive" data. These operations (in the method illustrated on figure 1 or according to this invention) require few computation resources and therefore again remain compatible with those offered by the current smartcard technologies (arithmetic and logic unit in particular).
The threshold values can also be stored in the smartcard, but this is not essential.
Once the global score has been obtained, it can be transmitted in encrypted format using a suitable algorithm, such as the triple DES (Data Encryption System) algorithm, to a remote server which decides to accept or reject the tested identity. All other operations can be carried out by an "external" module, therefore outside the smartcard, for example in its host terminal which has a priori more powerful computer resources. The last operation it carries out is the transmission to the smartcard of the regions to which the test vectors are allocated. The smartcard will use these allocation regions to deduce the scores of each vector then the desired global score.
It is clear that, without knowing the component Z of the client references, formed from the scores allocated to each region, which remain stored on the smartcard and which are not communicated to the outside, the security of the method is not jeopardised. The apprenticeship phase can be carried out in a secure environment, which means that the modelling module (figure 1 : module 11) can also be located outside the smartcard. This phase is carried out once only, or at least
very infrequently (refresh or update of client models, addition of new models, etc.).
In operational mode, the "ASV" system only includes the modules required for the test phase. Figure 3 is a diagrammatic representation of the hardware architecture of the system according to the invention implementing the ASV method described above.
Figure 4 shows the main modules of the method used to verify the biometric data signals and shows how they are distributed according to this invention between the host terminal and the embedded system.
In the example illustrated on figure 3, the host terminal 4 of the smartcard 5 is a microcomputer including in particular a central processing unit 40, generally a microprocessor, and storage means: random access and read only memories, grouped under the single reference 41. The microcomputer 4 also includes various traditional peripherals, such as a screen 44, a data entry keyboard 45 and a pointing device (mouse or other) 46.
A soundboard or equivalent 47, connected to an electro-acoustic transducer, is also provided: microphone Mic picking up a speech signal produced by a client to be authenticated CI. The signal picked up, after conversion into an electrical signal (S'p) is processed by the soundboard 47 and converted into a digital signal which can be processed by the microcomputer 4. The text to be uttered by the client CI can be prompted. In this case, it can be displayed on the screen 44.
The microcomputer 4 is equipped with other traditional hardware well known by those skilled in the art, for which a detailed description is not required (hard disk, etc.).
The client CI can enter a claimed identity by any suitable means, for example on the data entry keyboard 45. The tested identity may also be implicit: it may in particular correspond to the identity of the owner of the smartcard inserted in the terminal. In this case, it is an assumed identity.
Controlled by the central processing unit 40 cooperating with special programs, saved in the read only part of the memory means 41 , the system 4 carries out the acoustic analysis of the signals Y, loads the partition R (or the
partitions Ri) of the space E of the parameters associated with the claimed or assumed identity X of the client CI, allocates the characteristic vectors y^ to regions of the partition and transmits the indices β(t) (or the index vectors [βi,(t)]) of these regions to the smartcard 5. Using these indices and the score function Zx (or the set of score functions Zix) which remained on the chip, the latter evaluates the global score S. All these operations are carried out in compliance with the method according to the invention. It is therefore unnecessary to describe them again.
To exchange data with a smartcard 5, the microcomputer 4 is equipped with a smartcard reader 43, in this case internal, although it could equally well be external. This smartcard reader 43 enables two-way communication between the smartcard 5 and the terminal 4.
The data exchanges between the smartcard 5 and the host terminal 4 can be carried out according to a standard protocol, for example in compliance with the standards ISO 7816-1 to 7816-4.
The smartcard 5 includes an electronic chip 50, equipped with the units typical to this type of device, well known by those skilled in the art, which require no further description. Within the context of the invention, it includes in particular computation means (central processing unit) 51 (a microcontroller or microprocessor) associated with random access and read only memory means, grouped under the single reference 52.
According to an important characteristic of the invention, a module 53 intended to store the client models is represented. It must be pointed out, however, that although it has been represented separately, in practice module 53 is preferably part of the read only memory means included in the electronic chip 50, or at least part of the reprogrammable memory means: for example "PROM" or "EEPROM" (electrically erasable programmable) type memory.
On reception of the data transmitted by the microcomputer 4, via the smartcard reader 43, the computation means 53 compute a score by comparison with a client model stored in the module 53.
The result of the computations, i.e. the above-mentioned score can then be transmitted to a remote server 6, for example via the Internet RI. To do this, the microcomputer 4 is equipped with a modem 42 or any other suitable
means. The communications transit through the host terminal 4, via the smartcard reader 43.The transmission protocol chosen is, in this case, an Internet type protocol ("TCP/IP").
Preferably, the data is transmitted in encrypted format. The encryption operations can be carried out in the smartcard 5, controlled by the computation means 51 and programs saved in the memory means 52. A triple DES type algorithm can be used, for example, or any other suitable algorithm.
On reception of the request, the server 6 decides to accept or reject the client CI, based on criteria specific to the particular application considered and the utilisation context.
The transmissions between the server and the smartcard 5 may also be carried out according to the method described by the French patent application FR 2 782 435 A, entitled "Procede de communication entre une station d'utilisateur et un reseau, notamment de type Internet, et architecture de mise en oeuvre", which it is worthwhile consulting. The above-mentioned method enables direct communications between a smartcard and an Internet network, by transforming the smartcard into "WEB" type client/server.
Obviously, other types of remote server are possible. In particular, it could be a station in a local network. Similarly, the host terminal of the smartcard 5 could be a mobile telephone, an organiser or similar device.
Figure 4, showing more precisely the distribution of the functions and modules between the host terminal 4 and the embedded system, advantageously a smartcard 5, will now be described in greater detail. The biometric signal Y is received by an acoustic analysis module 10 in the host terminal 4, which computes a series of vectors y^ according to a principle which is similar to one in the known state of the art (figure 1 ). Moreover, the host terminal 4 loads from the smartcard 5 the partition Rx (or the partition vector RjX) and transmits it to a vector indexing module 32 (see figure 3B), specific to the method according to the invention. However, the score function Zx (or the score functions Zιχ) essential to the ASV according to the invention, remains in the smartcard 5 and is not transmitted to the host terminal.
A series of indices, β(i), ..., βrø β(N), (or the series of index vectors)
[βi,(i)]» ■■■> [βi,(t)], ■-. [βi,(N)] is computed by the module 32 (see figure 3B) of the host terminal 4 using the series of vectors y(i), ..., y^ V(N) and the partition
Rx (or partitions Rj,χ). This series of indices (or of index vectors) is transmitted by the host terminal to the vector score allocation module 33 (see figure 3B), located in the smartcard 5. The scores S(-i), ..., s^, ..., S(N), are computed by this module 33 with the function Zx (or the functions Z,χ) and input to the global score computation module 14, which produces a digital value S. The latter module 14 is also located in the smartcard 5 and may be similar to one in the known state of the art (figure 1). The global score signal S is transmitted to a decision module 15, a priori, external to the smartcard 5. This module generates a discrete signal SAR, of acceptance or rejection.
After reading the above, it is easy to see that the invention does in fact reach the objectives set. However, the invention is of course not limited to only those examples of realisation explicitly described, especially in relation to figures 1 to 4.
Some of the intermediate computations and the storage of the corresponding results can be carried out in an external device, for example a host terminal of the smartcard (microcomputer, mobile telephone, etc.), or even remotely (network, Internet server, etc.).
Note also that the method according to the invention may be generalised to finite state machine models, by using a partition and a score function for each state of the machine.