AU683783B2 - Method for forming a cohort for use in identification of an individual - Google Patents

Method for forming a cohort for use in identification of an individual

Info

Publication number
AU683783B2
AU683783B2 AU41109/96A AU4110996A AU683783B2 AU 683783 B2 AU683783 B2 AU 683783B2 AU 41109/96 A AU41109/96 A AU 41109/96A AU 4110996 A AU4110996 A AU 4110996A AU 683783 B2 AU683783 B2 AU 683783B2
Authority
AU
Australia
Prior art keywords
client
cohort
models
population
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU41109/96A
Other versions
AU4110996A (en
Inventor
Chen Fangxin
William Laverty
Iain Donald Graham Macleod
John Bruce Millar
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Australian National University
Original Assignee
Australian National University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AUPM9830A external-priority patent/AUPM983094A0/en
Application filed by Australian National University filed Critical Australian National University
Priority to AU41109/96A priority Critical patent/AU683783B2/en
Publication of AU4110996A publication Critical patent/AU4110996A/en
Application granted granted Critical
Publication of AU683783B2 publication Critical patent/AU683783B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Landscapes

  • Nitrogen And Oxygen Or Sulfur-Condensed Heterocyclic Ring Systems (AREA)
  • Saccharide Compounds (AREA)

Description

METHOD FOR FORMING A COHORT FOR USE IN IDENTIFICATION OF AN INDIVIDUAL
This invention relates to a method for forming a cohort for use in identification of an individual, and to a method of identification of an individual on the basis of that cohort. The method is concerned primarily, but not exclusively, with forming a cohort for use in identification of individuals on the basis of the degree of conformity of characteristics of voice sounds, but may be applied to identification on the basis of other characteristics of individuals.
In determining whether an individual is or is not a particular pre-identified individual ie a "client", comparison may be made as between pre-determined parameters relating to the pre-determined person and those measured when any individual is presented for verification. Particular parameters which may be used include parameters relating to speech, although parameters relating to other characteristics may be used. Among those other characteristics are parameters relating to how the presenting individual writes, uses a computer mouse, or uses a computer or other keyboard.
One method of identification, or verification, of whether or not an individual presenting for verification is or is not a pre-determined individual makes use of client models representing each of a population of individuals. Characteristics relating to a person presenting for verification are measured and compared with the characteristics for one or more of the total population. If the characteristics for the person presenting for verification match those for a particular one of the population, then the verification system makes a determination that the presenting person is the particular individual for which the characteristics match. A difficulty with systems of this kind is that values for characteristics for any person presenting may differ from reference values for that person which are used by the system. For example, the values for characteristics used by the system would normally comprise stored values measured in a previous test on the individual, the stored value then being compared with those measured when the person presents for verification. However, naturally occurring variations may exist as between those values stored and those which arise when a verification procedure is carried out. In the case of verification on the basis of characteristics relating to utterances of a person, those variations may, for example, comprise phonetic variations, variations due to environmental conditions and intra speaker variations. Thus, a person may utter a vowel in one fashion when the vowel appears in one word, and in a different fashion when it appears in another word. Again, the test conditions under which the original characteristic values were determined may be noise free, but there may be noise present in the environment when the individual presents for verification. Generally, the, it is not surely possible to effect identification simply on the basis of direct equatability of measured characteristics with those stored for the individual in question. Normally, comparison is effected as between characteristic values for more than one of the population, the determination of identity being made on the basis of the "distance" between the characteristics as stored for more than one of the population and those measured at verification. The characteristics which are measured in the verification process may be multi dimensional. For example, it has been found convenient to use cepstral analysis techniques to analyse the speech of a population and the person presenting for verification. Overlapping samples of, say, 30 millisecond may be taken of the amplitude-time wave form recorded during speech. In this case, it is convenient to generate 15 cepstral coefficients and to generate a model representing each member of the population and of the person presenting for verification, the models being 15 dimensional and with, for example, 128 points. The set of such points is commonly referred to as a code book for the person in question.
In the comparison of the code book of the person presenting for verification and those for the reference population employed by the verification technique, one may choose from the code books for the population code books of a "cohort", being a limited number of the population, and then compare the code book of the presenting person with codes books for that cohort. The cohort is selected from the total population on the basis that there is some similarity between the code book for the "client" in the population (ie the person whom the person presenting for verification purports to be) and the relevant cohort members. Selection of the cohort members can be made on the basis of the proximity of the centroids of the code book distributions to the centroid of the client's code book distributions. It is important that the multi-dimensional (Euclidean) distance between the centroid for the client and the various cohort members be significant, but not too great.
While methods based on the above have been found to be workable, hitherto inexplicable errors sometimes arise. For example, an error as basic as failure to discriminate between a male and a female voice may occur. It has now been determined that a likely cause of this difficulty is that the cohorts which are selected for the particular client do not have code book distributions which "surround" the code book distributions for the client in a satisfactory fashion. In particular, if the distance from the centroid of the code book distributions for the person presenting for verification to the client code book distribution centroid is great, then the difference between the distance to the centroids of the code book distributions for the client and for other cohort members will be relatively small. It may easily arise in this case that, because of the distribution of the cohort members with respect to the client, the distance between the code book distribution centroids of the client and of the person presenting for verification is less than the distance from the code book distributions centroid for the person presenting for verification than any of the other cohort members, at least as applies to some particular direction as between the code book distribution centroids for the person presently for verification and for the client and cohorts. Thus, the verification scheme may incorrectly assume that the person presenting for verification is the client in this instance. Merely increasing the number of cohorts will not necessarily rectify this problem.
In accordance with the present invention, the "coverage" is extended by a) selecting appropriate new cohort members from the population, and/or b) generating from data relating to existing cohort members, including or excluding a particular client, a model for inclusion in the cohort.
More particularly, embodiments of the invention provide methods for synthesising speech models for "phantom" speakers with specified speech characteristics, comprising: (i) for determining the desired characteristics for each successive cohort member during incremental assembly of a cohort; and/or
(ii) constructing synthetic speech models with the desired characteristics.
The synthesised models may be formed from combinations of real speech models. For example, speech events fall into several different classes (volcalic, fricative, nasal, etc.); during the synthesis procedure, those parts of the real speech models pertaining to different classes of speech events may be considered separately. As a result of their method of composition, the synthesised speech models may be representative of possible real speakers.
In one specific aspect, the invention comprises a method of assembling a cohort for a client being one of a population, comprising testing whether models of at least a substantial number (preferably all) of the population excluding the client meet an acceptance threshold test as to identity with a model for the client, determining, from each model meeting the threshold test, whether those models are distributed so as to present at least a substantial probability that models for non-members of the population spaced from the client model in all directions will each be closer to a member of the cohort, excluding the client, than to the client and, if that probability is less than a predetermined value, selecting from the population another cohort member which will reduce that probability.
In another aspect, the invention provides a method of assembling a cohort for a client, being one of a population, comprising testing whether models of at least a substantial number (preferably all) of the population excluding the client meet an acceptance threshold test as to identity with a model for the client, determining, from each model meeting the threshold test, whether those models are distributed so as to present at least a substantial probability that models for non-members of the population spaced from the client model in all directions will each be closer to a member of the cohort, excluding the client, than to the client and, if that probability is less than a predetermined value, generating a new model for inclusion in the population and which will reduce that probability. In another aspect the invention provides a method of assembling a cohort for a client, being one of a population, comprising testing whether models of at least a substantial number (preferably all) of the population excluding the client meet an acceptance threshold test as to identify with a model for the client, determining, from the or each meeting the threshold test, whether those models are distributed so as to present at least a substantial probability that models for non-members of the population spaced from the client model in all directions will each be closer to a member of the cohort, excluding the client, than to the client and, if that probability is less than a predetermined value, either selecting from the population another cohort member which will reduce that probability or generating a new model for inclusion in the population and which will reduce that probability.
The invention also provides a method of verification using a cohort assembled as above described.
The invention may be practised with models of different types, for example vector quantisation or hidden Markov models.
The following detailed description describes in more detail the context of the invention, and preferred features of the invention.
The "cohort normalised" method of speaker verification computes for each input utterance its relative distance from models of the client and a cohort of speakers drawn from the same population. It is assumed that variations which reduce the utterance fit to the client model will tend to have similar effects with respect to the cohort speaker models. The use of "relative distance" can lead to improved client/impostor discrimination.
The following relates to the design of suitable cohorts. Using VQ codebooks in multidimensional cepstral space as the basic speaker models, pairs of codebooks can be related geometrically in terms of vector differences between their centroids in cepstral space. In a well-designed cohort, the cohort members give adequate "coverage" of the client's codcbook in multidimensional space.
Cohort members are usually chosen on the basis of their similarity to the client. Experiments in which cohort members were instead chosen according to their position relative to the client led to a slight improvement in verification performance, suggesting that joint consideration of similarity and position would give even better results. However, with a limited set of speakers, it will often be difficult to find cohort members who meet these simultaneous requirements. At least in certain cases it is possible to synthesise suitable "phantom" codebooks based on those of real speakers.
In the classic procedure for speaker verification, an input utterance is accepted or rejected according to a threshold on its goodness of fit with a model of the client's speech. While such a measure truly reflects absolute deviations between the client's model and input utterances, it is sensitive to overlapping client and impostor distributions which arise because of the effects of intra-speaker variation, recording environment change and phonetic variation. This in turn leads to a high Equal Error Rate (EER).
An alternative approach (Rosenberg et al., 1992) uses a "cohort" of speakers, with speech models similar to that of the client, allowing relative measures of similarity or difference to be computed and reducing problems due to the above-mentioned variations. Similarity is judged on the basis of the mean distortion of a potential cohort speaker's utterances with respect to the client speaker's VQ model. Tests of the cohort method show that it is subject to problems with false acceptance of impostor utterances which are quite dissimilar to those of the client (eg. from a speaker of opposite sex to the client) but which still give a better fit to the client model than to any of the cohort models. A tentative geometrical explanation of this problem has been given in Chen, Millar and Wagner (1994), suggesting that the problem arises from inadequate "coverage" of the client by cohort members. Thus, a significant practical difficulty associated with use of the cohort-normalised method is that of assembling a suitable cohort from among the set of individuals whose speech has been modelled. In many cases, this set will be too small and for certain clients will not include a suitable set of speakers with similar speech models from which to assemble a cohort. Choice of suitable cohort members needs to be based on an understanding of the relationship between pairs of codebooks. Unless suitable potential cohort members are available and the cohort members are selected carefully, anomalous verification behaviour may result (e.g. an impostor of the opposite sex being verified as the client). Verification performance tends to improve with cohort size, but this increases verification time. By appropriate choice of cohort members, one can form a cohort of minimum size for a specified level of performance.
The techniques covered of the present invention directly address practical difficulties associated with assembling a suitable cohort for each client in the absence of a large set of speech models from which to select cohort members. Speakers may, in the following, be considered to be characterised by codebooks of 128 codewords (vectors in 15-D mel-frequency cepstral space) chosen such as to minimise the encoding error (distortion) with the training data sets. The number of muscle groups used in articulating speech sounds is much less than 15. Most of the relevant information for phonetic discrimination in the speech of two males can be represented with about six cepstral coefficients (Davis & Mermelstein, 1980). High-dimensional cepstral data relating to vocalic speech tends to fall on low-dimensional quadratic surfaces (predominantly parabolic) which can be characterised in terms of only four parameters (Hawkins, Macleod and Millar, 1994). Important components of the codeword distributions will thus have lower intrinsic dimensionality than that of their space of representation; the overall distributions can thus be expected to show significant clustering in cepstral space.
The similarity of a pair of codebooks may be assessed by measuring the distortion when one codebook is used to encode the speech data on which it was trained, and then to compare this to the distortion obtained with this data using the other codebook. Given that the codebooks for all speakers have been trained on the same set of utterances, the regions most densely occupied by codewords should be similar for pairs of similar codebooks.
The similarity of codebooks measured in such a way represents the similarity of speakers. The ratio of distortions is a scalar magnitude; as a directionless quantity it thus gives no indication as to which of two given codebooks would yield the smaller distortion when encoding the training data for a third, for example. As a similarity measure it provides an estimate of how "close" the regions of cepstral space occupied by the two codebooks are, but it does not indicate their relative positions. Scalar measures are thus of only limited use in diagnosing problems with a given cohort or in choosing cohort members for a given client.
A simple vector measure considers the relative differences between codebook centroids (formed from the average of all vectors in a codebook). While pairs of speaker models which give relatively small errors in encoding each other's training data will have similar distributions of codewords in cepstral space, with pairs of similar codebooks there may still be considerable interspersion of codewords. The question arises as to whether any differences (in magnitude and direction) between the centroids of such codebooks are meaningful in the statistical sense. Given the inhomogeneity and complexity of the codeword distributions in cepstral space, simple statistical characterisations (based on variances of these distributions) are not appropriate for answering this question. An alternative method associates the codewords in one book with neighbours in the other (eg. on the basis of closest Euclidean distance, as used in the following) and then analyses the distributional properties of the resulting set of 128 difference vectors, to see if they cluster in particular directions. A method for analysing these properties is next described.
A method to test the statistical significance of vectorial relationships between codebooks is now developed. The analysis of the difference vectors comprises the following steps:
(i) determine a mean directional component;
(ii) test the statistical significance of this component; and
(iii) subtract the mean from all vectors before analysing the residuals with Principal Components Analysis (PCA).
The mean vector between codeword pairs can simply be shown to be equal to the vector between the corresponding codebook centroids. PCA is used to check to what extent directional variability between codeword pairs is concentrated in a few directions.
Distributions of difference vectors with relatively similar and dissimilar pairs of codebooks have been analyses in accordance with the principles of this invention (with similarity being assessed in terms of average distortion), using Hotelling's T2 statistic to test the hypothesis that the mean vector of the difference vectors was non-zero. For all pairs of codebooks examined, the hypothesis was confirmed (p < 0.0001) showing that the difference vectors tend to point in a consistent direction. As a result of the low intrinsic dimensionality of the cepstral distributions of vocalic speech (Hawkins, Macleod and Millar, 1994), a significant proportion of the codewords will tend to cluster on hypersurfaces of lower dimensionality. If, however, there was substantial interspersion of the codeword distributions in the codebooks being compared, the difference vectors would have a less consistent orientation. The results of the analysis performed in this embodiment of the invention show that the degree of interspersion is limited, thus indicating that distributions of codewords from similar codebooks have similar shapes and that the concept of relative displacements between codebook pairs has statistical validity. After subtracting the mean vector from each difference vector, analysis of the residuals with PCA revealed one distinct non-noise directional component with dissimilar pairs of codebooks and two orthogonal components with similar codebook pairs (one component being somewhat larger than the other). This is shown in Figure 1 which illustrates analysis of residuals with similar and dissimilar codebooks.
The presence of non-noise Principal Components in the residuals, after the mean vector is subtracted, means that there are further systematic variations in the relationships between pairs of codeword distributions in addition to the mean displacement. Two codebooks with similar centroids may thus give large distortions when encoding each other's training data (eg. if one codebook had a greater span in certain directions than the other).
An estimate of progress towards explaining the total relationship between two codebooks is obtainable by computing the length of the (vector) sum of the difference vectors and comparing this to the sum of the scalar lengths of the individual vectors. If all difference vectors point in the same direction, these two lengths will be the same. If the difference vectors are randomly oriented, the summed vector length will be only a small fraction of the scalar sum of lengths. On examining the codebooks of potential cohort members in relation to given client codebooks, it was found that this length ratio varied from about 25% to 40%, a much larger than expected length for the sum of random vectors. In addition to supporting the statistical finding that the difference between codebook centroids is real, this result means that a large enough component of the total relationship is captured that clear benefits should follow from taking relative codebook positions into account when constructing cohorts.
The above provides statistical justification for using relative centroid positions to consider the extent to which the members of a cohort "enclose" a client or leave "gaps" in the coverage, given possible interspersion of the codeword distributions of similar speakers. The minimum distortion among the cohort models and the mean distortion across cohort models have both been proposed for use in the client/cohort comparison. The following tests are based on use of the min statistic.
An optimal cohort is one in which for each potential impostor there is a cohort member whose codebook encodes impostor utterances with lower distortion than that achieved with the client's codebook. Care is needed not to falsely reject the client's speech, so such cohort codebooks need to encode the client's training data with a significantly (but not dramatically) larger distortion than that obtained with the client's codebook: The cohort members should be similar, but not too similar, to the client. A percentage of impostors with speech very similar to the client will thus be falsely accepted, but this is unavoidable. Referring to Figure 2, imagine a hyperellipsoid (concentric with the client), which contains the centroids of codebooks for speakers similar to the client. The members of one potential cohort could then be distributed on the surface of a second larger hyperellipsoid with roughly twice the diameters of the first, so that (on average) utterances made by speakers whose codebook centroids lay outside the first hyperellipsoid would be attributed to a cohort member, and utterances made by speakers whose codebook centroids lay inside would be attributed to the client. By varying the size of the smaller hyperellipsoid, achieve the desired balance between Type I and Type II errors can be achieved. (Hyperellipsoids are advanced here instead of hyperspheres, because of the fact that other codebooks are unlikely to be evenly distributed about the client's.) In the usual case, only a limited set of speakers (and their trained codebooks) will be available for cohort construction. The most similar speakers in this set to a given client may well be less (or sometimes more) similar than desired. Nevertheless, a functional cohort of size N can be formed by choosing the N most similar codebooks. Just as the codeword distributions themselves will be of lower intrinsic dimensionality than that of the representation space, it might be expected that the relative positions of codebook centroids (and thus of cohort members) will also be unevenly distributed. For example, cepstral features will tend to vary in a systematic manner with changes in parameters such as vocal tract length and shape. In terms of geometric analogy, anomalous acceptance of dissimilar impostors with a cohort chosen from the speakers most similar to the client arises because the client is "covered" too sparsely or too unevenly. An alternative procedure for assembling a cohort is as follows. Choose a speaker who is similar (but not too similar) to the client as the first cohort member. Test the remaining speaker population to see which speaker (of about the desired similarity to the client) gives the highest percentage of false acceptances with this cohort of size one. This speaker will lie in a direction which is not well covered by the first cohort member and is chosen as the second cohort member. The procedure is repeated until a cohort of the required size has been formed. Speech data useful in practising the invention is described in Millar et al. (1994).
The population of 45 speakers is divided into two - a cohort formation population of 25 speakers and a client/test population of 20 speakers (10 male and 10 female). Using the method of assessment outlined in Chen, Millar and Wagner (1994), a test was made of the verification performance of cohorts assembled (i) from the speakers most similar to the client, and (ii) by starting with the most similar speaker to the client, adding the speaker who gave the greatest number of false acceptances with this cohort of size one, and so on as each new member was added. Because of the limited speaker population available, the similarity of cohort members to the client was not considered in building up the cohort using the "optimum direction" method (which was intended to identify and then fill gaps in the cohort coverage of the client). The results given in Table 1 show a slight advantage for the direction method, even though (apart from the first cohort member) similarity to the client was not considered. For several clients, the EER with the "optimum direction" procedure increased slightly as the cohort size increased from three to five; in this case the final one or two cohort members chosen must have led to false rejections of the client (ie. these members were too similar to the client). Analysis of the EERs achieved with Min5 and Sel5 showed that the observed improvement with Sel5 was not statistically significant. Thus these experiments indicate that the direction and similarity methods produce cohorts of similar quality. Given the different basis of these two methods of assembling cohorts, simultaneous consideration of both coverage and similarity may improve overall performance.
Given the difficulties encountered with locating suitable cohort members (because of the limited population of speakers), the question arises as to whether it is possible to form synthetic codebooks with the desired properties. For example, it would be possible to modify the client's codebook to get a new codebook which is just sufficiently dissimilar (ie. gives the desired amount of distortion when encoding the client's training data with respect to the balance of Type I and Type II errors). For example, it would be possible to disturb 1 or more of the 15 coefficients in each codeword at a time to yield synthetic cohorts displaced a desired distance from the client in the direction of the altered coefficients. Experiments showed that codebooks synthesised in this manner had little practical utility they usually did not encode impostor utterances as efficiently as the client's codebook and thus did not lead to improvements in speaker verification performance. The source of the problem here is the use of codeword distributions which are most likely densely clustered in only a small region of the 15-D cepstral space. In synthesising "phantom" codebooks we need to ensure that the synthetic codewords are representative of those of typical speakers similar to the client. Working in a space which is known to be inhomogeneously occupied, we can minimise errors arising from inhomogeneities by using codeword pairs from similar real speakers and interpolating synthesised values, thereby staying "close" to known real values.
Experiments in synthesising codebooks by either adding or subtracting a fixed vector displacement to or from all codewords in a real speaker's codebook, either the client's or a (potential) cohort member's, were instructive. The fixed displacement was usually 50% of the difference vector between the client's and cohort's codebook centroids. In a typical example, the client's codebook encoded a set of test client utterances with a distortion of 2783, the cohort's codebook gave a distortion of 3323, the client's codebook displaced by either + or - 50% of the difference vector between the centroids gave distortions of 2811 and 2799 respectively, and the cohort's codebook displaced by + or - 50% of this difference vector gave distortions of 3422 and 3255 respectively. Two points to be noted here are that (i) the observed increases and decreases in distortion are consistent with our geometric interpretation, and (ii) when the client codebook is displaced halfway towards the cohort, the distortion increases but is still substantially smaller than the (reduced) distortion obtained when the cohort codebook is displaced halfway towards the client.
The second point above provides further evidence that the distributions of codewords vary in ways other than overall position speakers are characterised by the shapes of their codeword distributions as well. A second method of interpolation was thus tried, which aimed to indirectly capture something of these other dimensions of variation. Instead of adding a fixed vector displacement to all codewords, interpolation (or extrapolation) was affected on the basis of individual difference vectors between codeword pairs. As an increasing percentage of these difference vectors are added to the codewords in the client codebook, so the synthesised codebook will gradually change from one that is similar to the client codebook into one that is similar to the cohort codebook. For the example client and cohort codebooks considered above, a synthetic codebook interpolated using 50% of the individual difference vectors for codeword pairs gave a distortion of 3078, which was close to halfway (3053) between the respective client and cohort distortions of 2784 and 3323.
Figure 3 illustrates Achieved Equal Error Rate percentages with an absolute threshold (ABS_VQ) and with selected cohorts of size n chosen conventionally (Minn) and according to false acceptances (Seln). The final column (Sel5') shows the improved results obtained with several clients (marked with *) through use of a final synthetic cohort member. This demonstrates that with some clients the EER increased from Sel4 to Sel5. In these cases, the chosen fifth cohort member was used to construct an extrapolated synthetic codebook (moving the chosen cohort codebook further away from the client) and recalculated the EER (shown as Sel5'). In all cases this procedure prevented the EER from increasing between Sel4 and Sel5'; in two cases (clients 19 and 20) the synthetic cohort member reduced the EER between Sel4 and Sel5'. The reduction in the overall error rate to 2.83% was not, however, sufficient to make the difference between Min5 and Sel5' statistically significant.
The overall results of the experiments provide evidence that the distributions of codewords in 15-D MFCC space are rather complex. Although it can be shown statistically that the observed mean displacements between similar codebooks are real and do not occur just by chance, the distributions of codewords in given codebooks will vary in shape and extent as well as position. The present concept of relative codebook positions captures an important part, but only a part, of the total relationship between similar codebooks.
The following program listing is for a "C" computer program suitable for finding the average vector distance between sets of paired cohorts.
The following program listing is for a "C" program suitable for synthesising codebook distributions for a "phantom" population member.
REFERENCES
Chen, F., Millar, B. and Wagner, M. (1994), "Hybrid threshold approach in text-independent speaker verification," Proc. Int. Conf. on Spoken Language Processing, Yokohama, 1855╌ 1858.
Davis, S. B. and Mermelstein, P. (1980), "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences," IEEE Trans. Acoustics, Speech and Signal Processing, Vol. ASSP-28, 357╌ 366.
Hawkins, S., Macleod, I. and Millar, B. (1994), "Modelling individual speaker characteristics by describing a speaker's vowel distribution in articulatory, cepstral and formant space," Proc. Int. Conf. on Speech Science and Technology, Perth. Millar, B., Chen, F., Macleod, I., Ran, S., Tang, H., Wagner, M. and Zhu, X. (1994), "Overview of speaker verification studies towards technology for robust user-conscious secure transactions," Proc. Int. Conf. on Speech Science and Technology, Perth.
Millar, B., Chen, F. and Wagner, M. (1994), "The efficacy of cohort normalisation in a speaker verification task under different types of speech signal variance," Proc. Int. Conf. on Speech Science and Technology, Perth.

Claims (18)

CLAIMS:
1. A method of assembling a cohort for a client being one of a population, comprising testing whether models of at least a substantial number of the population excluding the client meet an acceptance threshold test as to identity with a model for the client, determining, from each model meeting the threshold test, whether those models are distributed so as to present at least a substantial probability that models for non-members of the population spaced from the client model in all directions will each be closer to a member of the cohort, excluding the client, than to the client and, if that probability is less than a predetermined value, selecting from the population another cohort member which will reduce that probability.
2. A method as claimed in claim 1 wherein said substantial number of the population excluding the client comprises all of the population excluding the client.
3. A method as claimed in claim 1 or claim 2 wherein said models are codebooks each of a number of codewords.
4. A method as claimed in claim 3 wherein said testing is effected by assessing the distance between centroids of pairs of the codebooks.
5. A method as claimed in claim 3 wherein said testing is effected by assessing the distance between codewords in one said codebook and neighbour codewords in another said codebook.
6. A method as claimed in claim 4 or claim 5 wherein the distance is a Euclidean distance.
7. A method as claimed in any preceding claim comprising:
(a) choosing a first model among models of the population not including the client model, said first model being similar to but still exhibiting significant differences with respect to the client model, (b) adopting said test model as a first member of the cohort.
(c) testing the remaining models for the population, excluding the client and first models, to determine a further model, among those of the remaining models which have a degree of similarity to the client model similar to that which exists between the first and client models, which provides the highest degree of false acceptances with respect to the client,
(d) adding said further model to said cohort, and
(e) repeating steps (c) and (d) using all models previously added to the cohort and the client model to generate successive other further models which are added to the cohort.
8. A method of assembling a cohort for a client being one of a population, comprising testing whether models of at least a substantial number of the population excluding the client meet an acceptance threshold test as to identity with a model for the client, determining, from each model meeting the threshold test, whether those models are distributed so as to present at least a substantial probability that models for non-members of the population spaced from the client model in all directions will each be closer to a member of the cohort, excluding the client, than to the client and, if that probability is less than a predetermined value, generating a new model for inclusion in the population and which will reduce that probability.
9. A method as claimed in claim 8 wherein said substantial number of the population excluding the client comprises all of the population excluding the client.
10. A method as claimed in claim 8 or claim 9 wherein said models are codebooks each of a number of codewords.
11. A method as claimed in claim 10 wherein said testing is effected by assessing the distance between centroids of pairs of the codebooks.
12. A method as claimed in claim 10 wherein said testing is effected by assessing the distance between codewords in one said codebook and neighbour codewords in another said codebook.
13. A method as claimed in claim 11 or claim 12 wherein the distance is a Euclidean distance.
14. A method as claimed in any one of claims 10 to 13 wherein the new model is generated by adding or subtracting a fixed vector displacement to the codewords of models in the population excluding any generated models.
15. A method of assembling a cohort for a client one of a population, comprising testing whether models of at least a substantial number (preferably all) of the population excluding the client meet an acceptance threshold test as to identity with a model for the client, determining, from each meeting the threshold test, whether those models are distributed so as to present at least a substantial probability that models for non-members of the population spaced from the client model in all directions will each be closer to a member of the cohort, excluding the client, than to the client and, if that probability is less than a predetermined value, either selecting from the population another cohort member which will reduce that probability or generating a model for inclusion in the population and which will reduce that probability.
16. A method as claimed in any preceding claim wherein the models are vector quantisation or hidden Markov models.
17. A method as claimed in any preceding claim wherein said models represent speech characteristics.
18. A method of verification as to whether a person is said client, using a cohort claimed in any preceding claim, comprising comparing a model relating to said person with said cohort and determining whether the person is the client on the basis of similarity of the models relating to the person and to the cohort.
AU41109/96A 1994-12-02 1995-12-01 Method for forming a cohort for use in identification of an individual Ceased AU683783B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU41109/96A AU683783B2 (en) 1994-12-02 1995-12-01 Method for forming a cohort for use in identification of an individual

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
AUPM9830 1994-12-02
AUPM9830A AUPM983094A0 (en) 1994-12-02 1994-12-02 Method for forming a cohort for use in identification of an individual
AU41109/96A AU683783B2 (en) 1994-12-02 1995-12-01 Method for forming a cohort for use in identification of an individual
PCT/AU1995/000807 WO1996017341A1 (en) 1994-12-02 1995-12-01 Method for forming a cohort for use in identification of an individual

Publications (2)

Publication Number Publication Date
AU4110996A AU4110996A (en) 1996-06-19
AU683783B2 true AU683783B2 (en) 1997-11-20

Family

ID=25625468

Family Applications (1)

Application Number Title Priority Date Filing Date
AU41109/96A Ceased AU683783B2 (en) 1994-12-02 1995-12-01 Method for forming a cohort for use in identification of an individual

Country Status (1)

Country Link
AU (1) AU683783B2 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0388067A2 (en) * 1989-03-13 1990-09-19 International Business Machines Corporation Speech recognition system
EP0424071A2 (en) * 1989-10-16 1991-04-24 Logica Uk Limited Speaker recognition
WO1994022132A1 (en) * 1993-03-25 1994-09-29 British Telecommunications Public Limited Company A method and apparatus for speaker recognition

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0388067A2 (en) * 1989-03-13 1990-09-19 International Business Machines Corporation Speech recognition system
EP0424071A2 (en) * 1989-10-16 1991-04-24 Logica Uk Limited Speaker recognition
WO1994022132A1 (en) * 1993-03-25 1994-09-29 British Telecommunications Public Limited Company A method and apparatus for speaker recognition

Also Published As

Publication number Publication date
AU4110996A (en) 1996-06-19

Similar Documents

Publication Publication Date Title
US6081660A (en) Method for forming a cohort for use in identification of an individual
Yoshimura et al. Speaker interpolation in HMM-based speech synthesis system.
Hillenbrand et al. Identification of resynthesized/hVd/utterances: Effects of formant contour
Yoshimura et al. Speaker interpolation for HMM-based speech synthesis system
Markel et al. Text-independent speaker recognition from a large linguistically unconstrained time-spaced data base
US6073096A (en) Speaker adaptation system and method based on class-specific pre-clustering training speakers
US6697778B1 (en) Speaker verification and speaker identification based on a priori knowledge
Nossair et al. Dynamic spectral shape features as acoustic correlates for initial stop consonants
WO1996017341A1 (en) Method for forming a cohort for use in identification of an individual
Ganchev Speaker recognition
Karnjanadecha et al. Signal modeling for high-performance robust isolated word recognition
US6567771B2 (en) Weighted pair-wise scatter to improve linear discriminant analysis
JP7176628B2 (en) Speech processing device, speech processing method, and speech processing program
JP2002082694A (en) Speaker verification and speaker identification based on established knowledge
AU683783B2 (en) Method for forming a cohort for use in identification of an individual
WO2002029785A1 (en) Method, apparatus, and system for speaker verification based on orthogonal gaussian mixture model (gmm)
Nirmal et al. Statistically Significant Duration-Independent-based Noise-Robust Speaker Verification.
Iyer et al. Speaker distinguishing distances: a comparative study
Ding et al. Improving sparse representations in exemplar-based voice conversion with a phoneme-selective objective function.
Wilpon et al. Connected digit recognition based on improved acoustic resolution
Hayashi et al. An Investigation of Feature Difference Between Child and Adult Voices Using Line Spectral Pairs
Klumpp et al. Feature Space Visualization with Spatial Similarity Maps for Pathological Speech Data.
Abdulwahab et al. Acoustic Comparison of Malaysian and Nigerian English Accents
Yoon et al. UNIQUE: Unsupervised Network for Integrated Speech Quality Evaluation
Barlow et al. Measuring the dynamic encoding of speaker identity and dialect in prosodic parameters.

Legal Events

Date Code Title Description
MK14 Patent ceased section 143(a) (annual fees not paid) or expired