EP1800241A1 - Analyse statistique dans une reconnaissance de motifs, notamment dans la reconnaissance d'empreintes - Google Patents

Analyse statistique dans une reconnaissance de motifs, notamment dans la reconnaissance d'empreintes

Info

Publication number
EP1800241A1
EP1800241A1 EP05801600A EP05801600A EP1800241A1 EP 1800241 A1 EP1800241 A1 EP 1800241A1 EP 05801600 A EP05801600 A EP 05801600A EP 05801600 A EP05801600 A EP 05801600A EP 1800241 A1 EP1800241 A1 EP 1800241A1
Authority
EP
European Patent Office
Prior art keywords
representation
expression
representations
consideration
probability distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP05801600A
Other languages
German (de)
English (en)
Inventor
Cedric The Forensic Science Service NEUMANN
Roberto The Forensic Science Service PUCH-SOLIS
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Forensic Science Service Ltd
Original Assignee
Forensic Science Service Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB0422784A external-priority patent/GB0422784D0/en
Priority claimed from GB0502900A external-priority patent/GB0502900D0/en
Priority claimed from US11/083,579 external-priority patent/US7369700B2/en
Application filed by Forensic Science Service Ltd filed Critical Forensic Science Service Ltd
Publication of EP1800241A1 publication Critical patent/EP1800241A1/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/12Fingerprints or palmprints
    • G06V40/1347Preprocessing; Feature extraction
    • G06V40/1353Extracting features related to minutiae or pores
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/12Fingerprints or palmprints
    • G06V40/1365Matching; Classification
    • G06V40/1371Matching features related to minutiae or pores

Definitions

  • This invention concerns improvements in and relating to identifier comparison, particularly, but not exclusively, in relation to the comparison of biometric identifiers or markers, such as prints from a known source with biometric identifiers or markers, such as prints from and unknown source.
  • biometric identifiers or markers such as prints from a known source
  • biometric identifiers or markers such as prints from and unknown source.
  • the invention is applicable to fingerprints, palm prints and a wide variety of other prints or marks, including retina images.
  • the useful result may be evidence to support a person having been at a crime scene.
  • the present invention has amongst its potential aims to provide a method of comparison which is more versatile.
  • a method of comparing a first representation of an identifier with a second representation of an identifier including: providing an expression of the first representation; considering the expression of the first representation against a probability distribution based on the variation in the expression between different example representations of the second representations, to provide a first consideration; considering the expression of the first representations against a probability distribution based on the variation in the expression between different population representations, to provide a second consideration; using the first consideration and second consideration to provide a measure of comparison between the first representation and the second representation.
  • the first aspect of the invention may include any of the features, options or possibilities set out elsewhere in this application, including those of the second and/or third aspects of the invention. Particularly in the context of a first form of the invention, which may particularly reflect the form detailed in the second aspect of the invention below, the -following features, options and possibilities may be provided.
  • the expressions of the first and second representations maybe in the form of a distance.
  • the distance may be obtained by considering data from the first representation against data from the second representation.
  • the data from the first representation and/or the data from the second representation is in the form of a vector.
  • a distance between a first representation vector and a second representation vector is obtained.
  • the vector(s) may be of the form provided below and/or detailed in applicant's UK patent application number 0502902.0 of 11 February 2005 and/or UK patent application number 0422785.6 of 14 October 2004 and the contents of that application are incorporated herein by reference.
  • the distance between the expression of the first representation and the expression of the second representation is considered against a probability distribution based on the variation in the distances between expressions between different example representations of the second representation to provide the first consideration is expressed in this way.
  • the distance between the expression of the first representation and the expression of the second representation is considered against a probability distribution on distances based on the variation in the expression between different population representations to provide the second consideration is provided in this way.
  • the different example representations of the second representation maybe provided from the same identifier as the second representation of an identifier being compared with the first representation of an identifier.
  • the example representations of the second representation may come from an individual, with the second representation which is being considered against the first coming from the same individual.
  • the individual may be a suspect and in particular the suspected source of the first representation.
  • the different example representations of the second representation may be provided from a different identifier to the identifier of the second representation of an identifier being compared with the first representation of an identifier.
  • the example representations of the second representation may come from an individual, with the second representation which is being considered against the first coming from a different individual.
  • the different individual may be a suspect and in particular the suspected source of the first representation.
  • the example representations of the second representation may come from individual who is not a suspect.
  • a plurality of example representations of the second representation may be provided.
  • An expression for one or more pairs of example representations may be provided.
  • the expression(s) may be in the form of a distance.
  • the distance may be obtained by considering data from a first example representation of the second representation against data from a second example representation of the second representation.
  • the data from the first and/or second example representations of the second representation are in the form of a vector.
  • the vector(s) may be of the form provided below and/or detailed in applicant's UK patent application number 0502902.0 of 11 February 2005 and/or UK patent application number 0422785.6 of 14 October 2004 and the contents of that application are incorporated herein by reference.
  • a probability distribution based on the expressions of the plurality of example representations, particularly based on the cross-distances between the expressions of the plurality of example representations may be provided.
  • the method may include considering the expression of the first representation against the probability distribution for the example representations, and particularly may include considering the distance between the expression of the first representation and the expression of the second representation against a probability distribution for distances of the example representations, to provide the first consideration.
  • the different representations of the population representation may be provided from different individuals and/or different identifiers, such as fingerprints, thereof.
  • a plurality of example representations of the second representation may be provided.
  • An expression for one or more pairs of example representations may be provided.
  • the expression(s) may be in the form of a distance.
  • the distance may be obtained by considering data from a first example representation of the second representation against data from a second example representation of the second representation.
  • the data from the first and/or second example representations of the second representation are in the form of a vector.
  • the vector(s) may be of the form provided below and/or detailed in applicant's UK patent application number 0502902.0 of 11 February 2005 and/or UK patent application number 0422785.6 of 14 October 2004 and the contents of that application are incorporated herein by reference.
  • a plurality of population representations from different origins, for instance different persons and/or different fingers are provided.
  • an expression of each of the plurality of population representations is provided.
  • a probability distribution based on the expressions of the plurality of population representations, particularly based on the distances between the expression of the first representation and the expressions of the plurality of population representations is provided.
  • the method may include considering the expression of the first representation against the probability distribution for the population examples, and particularly may include considering the distance between the expression of the first representation and the expression of the second representation against the probability distribution on distances between the expression of the second representation and the population examples, to provide a second consideration.
  • the measure of comparison between the first representation and the second representation may be a likelihood ratio.
  • the likelihood ratio may be the quotient of two probabilities, particularly the numerator being the probability the two representations considering the hypothesis that the vectors originate from two representations of the same identifier, particularly the denominator being the probability of the two representations considering the hypothesis that the vectors originate from representations of different identifiers.
  • the method may include providing a Bayesian network which represents the variation in an expression for a plurality of example representations of the second representation.
  • the method may include providing a probability distribution from the Bayesian network for the example representations.
  • the method may include considering the expression of the first representation against the probability distribution for the example representations, to provide a first consideration.
  • the method may include providing a Bayesian network which represents the variation in an expression for a plurality of population representations.
  • the method may include providing a probability distribution from the Bayesian network for the population representations.
  • the method may include considering the expression of the first representation against the probability distribution for the population examples, to provide a second consideration.
  • the probability distribution for the differences between a plurality of representations of the identifier from a common source may be obtained by the probability distribution being generating from a Bayesian network.
  • the probability distribution for the differences between a plurality of representations of the identifier from different sources may be obtained by the probability distribution being generated from a Bayesian network.
  • a method of comparing a first representation of an identifier with a second representation of an identifier including: providing an expression of the first representation; providing an expression of the second representation; providing a plurality of example representations of the second representation; providing an expression of each of the plurality of example representations; providing a probability distribution based on the expressions of the plurality of example representations of the second representations; considering the expression of the first representation against the probability distribution for the example representations, to provide a first consideration; providing a plurality of population representations from different origins; providing an expression of each of the plurality of population representations; providing a probability distribution based on the expressions of the plurality of population representations; considering the expression of the second representation against the probability distribution for the population examples, to provide a second consideration; using the first consideration and second consideration to provide a measure of comparison between the first representation and the second representation.
  • the considering of the expression of the first representation against the probability distribution for the example representations, to provide a first consideration includes considering the expression of the first representation and the expression of the second representation.
  • the consideration is of the distance between the expression of the first representation and the expression of the second representation.
  • the considering of the expression of the expression of the second representation against the probability distribution for the population examples, to provide a second consideration includes considering the expression of the second representation and the expression of the first representation.
  • the consideration is of the distance between the expression of the second representation and the expression of the first representation.
  • the second aspect of the invention may include any of the features, options or possibilities set out elsewhere in this application, including those of the first and/or third aspects of the invention.
  • the second aspect of the invention may particularly include features, options or possibilities from amongst the following.
  • the second aspect and/or other aspects of the invention may provide, that the probability distribution based on the expressions of the plurality of the example representations are based on the differences or cross-differences between the expressions.
  • the difference between the expression of the first representation and the expression of the second representation against the probability distribution is considered to provide the first consideration.
  • the differences or cross-differences of expressions of the plurality of example representations of the second representations are so considered.
  • the probability distribution based on the expressions of the plurality of population representations is based upon the differences in the expression of the first representation and the expressions of the plurality of population representations.
  • the consideration of the expression of the first representation against the probability distribution for the population examples to provide a second consideration involves considering the difference of the expressions of the first and second representations against the probability distribution for the population examples.
  • a third aspect of the invention we provide a method of comparing a first representation of an identifier with a second representation of an identifier, the method including: providing an expression of the first representation; providing a Bayesian network which represents the variation in an expression for a plurality of example representations of the second representation; providing a probability distribution from the Bayesian network for the example representations; considering the expression of the first representation against the probability distribution for the example representations, to provide a first consideration; providing an expression of the first representation; providing a Bayesian network which represents the variation in an expression for a plurality of population representations; providing a probability distribution from the Bayesian network for the population representations; considering the expression of the first representation against the probability distribution for the population examples, to provide a second consideration; using the first consideration and second consideration to provide a measure of comparison between the first representation and the second representation.
  • the third aspect of the invention may include any of the features, options or possibilities set out elsewhere in this application, including those of the first and/or second aspects of the invention.
  • the third aspect of the invention may particularly include features, options or possibilities from amongst the following.
  • the first aspect and/or second aspect and/or third aspect may include features, options or possibilities from amongst the following.
  • the first and/or second representation of the identifier may have been captured.
  • the capture may have occurred from a crime scene and/or an item and/or a location and/or a person.
  • the capture may have occurred by scanning and/or photography.
  • the first and/or second representations of the identifier may be captured in the same or a different way to the other.
  • the first and/or second representation may have already been processed compared with the captured representation.
  • the processing may have involved converting a colour and/or shaded representation into a black and white representation.
  • the processing may have involved the representation being processed using Gabor filters.
  • the processing may have involved altering the format of the representation.
  • the alteration in format may involve converting the representation into a skeletonised format.
  • the alteration in format may involve converting the representation into a format in which the representation is formed of components, preferably linked data element sets.
  • the alteration may convert the representation into a representation formed of single pixel wide lines.
  • the processing may have involved cleaning the representation, particularly according to one or more of the techniques provided in applicant's UK patent application number 0502893.1 of 11 February 2005 and/or UK patent application number 0422785.6 of 14 October 2004.
  • the processing may have involved healing the representation, particularly according to one or more of the techniques provided in applicant's UK patent application number 0502893.1 of 11 February 2005 and/or UK patent application number 0422785.6 of 14 October 2004.
  • the processing may have involved cleaning of the representation followed by healing of the representation.
  • the processed representation may be subjected to one or more further steps.
  • the one or more further steps may include the extraction of data from the processed representation, particularly as set out in detail in applicant's UK patent application number 0502990.5 of 11 February 2005.
  • the identifier may be a biometric identifier or other form of marking.
  • the identifier may be a fingerprint, palm print, ear print, retina image or a part of any of these.
  • the first and/or second representation may be a full or partial representation of the identifier.
  • the first representation may be from the same or a different source as the second representation.
  • the expression of the first and/or second representation and/or example representations and/or population representations may be in the form of a vector, for instance a feature vector.
  • the expression of the first and/or second representation expression and/or example representations and/or population representations may involve selecting a plurality of features in a representation of an identifier and linking each feature to one or more of the other features and/or a center therefore.
  • the expression of the first and/or second representation and/or example representations and/or population representations may particularly be provided according to the features, options and possibilities set out in applicant's UK patent application number 0502893.1 of 11 February 2005 and/or UK patent application number 0422785.6 of 14 October 2004 and the contents of that application are incorporated herein by reference.
  • the step of providing the expression may involve one or more of the following options.
  • the selecting of a plurality of features may involve selecting a feature and then selecting one or more further features.
  • One or more of the features may be a ridge end.
  • One or more of the features may be a bifurcation.
  • One or more of the features may be another form of minutia.
  • the plurality of features preferably numbers three.
  • one or more of the selected plurality of features are linked to at least two other selected features. More preferably two or more of the plurality of selected features are linked to at least two other selected features. Ideally all of the plurality of selected features are linked to at least two other selected features.
  • one of the plurality of selected features is only linked to two of the other plurality of selected features.
  • the linking of the plurality of selected features to each other by lines forms a triangle.
  • One or more or all of the plurality of selected features may be linked to other features other than the selected features too.
  • the link is preferably in the form of a line.
  • the line is preferably a straight line.
  • the features and links form triangles formed according to the Delaunay triangulation methodology.
  • the plurality of features may number three or more.
  • the plurality of features may number three to twenty, preferably three to sixteen and ideally three to twelve.
  • one or more of the features are linked to at least one other feature and/or a center. More preferably two or more of the plurality of selected features are linked to at least another selected feature and to a common center. Ideally all of the plurality of selected features are linked to another of the selected features and to a common center.
  • one of the plurality of features is only linked to one other feature and a centre.
  • the linking of the selected features and center to each other is provided by lines.
  • the lines may define a polygon, for instance a triangle or a quadrilateral.
  • One or more or all of the plurality of selected features may be linked to other features other than the selected features too.
  • the link is preferably in the form of a line.
  • the line is preferably a straight line.
  • the expression of the first and/or second representation and/or example representations and/or population representations may include information on the type of feature for one or more, preferably all, the selected features.
  • the type may be the minutia forming the feature, such as ridge end and/or bifurcation and/or other.
  • the expression may include information on the direction of the link for one or more, preferably all, of the links between the features.
  • the information may be on the relative direction of the links.
  • the expression may include information on the distances between one, and preferably all, pairs of the features.
  • the direction of one or more of the links, preferably all, may be expressed relative to the orientation.
  • the orientation may be about a fixed axis.
  • the orientation is relative to the opposing segment of the triangle.
  • the direction and/or orientation are expressed in terms independent of the representation.
  • the direction may be expressed as a number, preferably within a range, most preferably within the range between 0 and 2 ⁇ radians.
  • the orientation may be expressed as a number, preferably within a range, most preferably within the range between 0 and ⁇ radians.
  • the expression ideally as a vector, includes three pieces of information on the feature types, three pieces of information on the relative direction of the links between the features and three pieces of information on the distances between the features.
  • the vector preferably includes nine pieces of information.
  • the expression of the first and/or second representation and/or example representations and/or population representations, particularly as a vector may include information on the type of feature for one or more, preferably all, the selected features.
  • the type may be the minutia forming the feature, such as ridge end and/or bifurcation and/or other.
  • the expression may include information on the distance between a feature and at least one other feature.
  • the expression includes information on the distance between a feature and one other feature and information on the distance between the feature and a second other feature, and ideally only on such distances between the feature and other features.
  • the expression may include information on the radius between the center and one, preferably all, of the features.
  • the expression may include information on the surface or surface area of one, preferably all, of the polygons defined by two of more features and the center.
  • the expression may include information on the direction of the feature for one or more, preferably all, of the features, preferably with the direction being defined relative to the representation or image thereof.
  • the direction of one or more of the features, preferably all, may be expressed relative to the orientation.
  • the orientation may be about a fixed axis.
  • the expression may include information on the region of the feature for one, preferably all, of the features.
  • the expression may include information on the general pattern of the representation.
  • the expression ideally as a vector, includes a piece of information on the feature type, a piece of information on the relative direction of the feature, a piece of information on the distances between the feature and another feature and the radius between the feature and the center for each selected feature.
  • the considering of the expression of the first representation against a probability distribution may form the numerator in the use of the first consideration and second consideration.
  • the considering of the expression may involve finding the frequency for that expression value in the probability distribution.
  • the considering of -expression of the first representation against the probability distribution may involve those information pieces that are continuous in nature.
  • the probability distribution based on the variation in the expression between different example representations may provide frequency of occurrence for different expression values.
  • the probability distribution is obtained from physically taken example representations.
  • the different example representations preferable all come from the same source as one another.
  • the source may the same or different to that believed to be the source of the first representation.
  • the probability distribution based on the expressions of example representations may have the form set out in respect of the first and/or second and/or third aspects of the invention and/or in respect of any of the features, options or possibilities set out in the next two paragraphs.
  • the probability distribution may be estimated from a database of example representations of an identifier taken from the same source.
  • the source may be the same as the first representation or may be different there from.
  • the database may contain details of the distances between example representations and/or the comparison of different example representations using a vector to express each.
  • the database preferably contains one or more example representations taken under different conditions.
  • the different conditions may be one or more of, different pressures applied by the source when forming the representation example, such as the fingerprint; different substrates to which the source was applied when forming the example representation; different movements used by the source when the source was applied to form the example representation; different extents of distortion in the example representation compared with a perfect example representation; different levels of completeness of the example representation.
  • the database may contain one or more sets of such details. Different sets may come from different sources, but ideally the details within a set come from the same source.
  • the database is populated by the identification of corresponding features and links, ideally triangles and lines, in a series of representations taken from the same source.
  • the database can be populated by processing a representation and/or an example representation so as to obtain an expression thereof and then or during that process applying distortion functions thereto.
  • the distortion functions can then be calculated, for instance using thin plate splines.
  • One or more sets of such details may be provided in this way.
  • Other sets may be formed by applying the distortion functions to other representations and/or an example representation.
  • the technique of applicant's UK patent application number 0502849.3 of 11 February 2005 and/or of UK patent application number 0423648.5 filed 26 October 2004 may be used.
  • the probability distribution may be derived from one or more factor probability distributions.
  • the probability distribution or the factor probability distribution may be obtained from a Bayesian network.
  • the Bayesian network may be obtained and/or estimated by considering a plurality of example representations.
  • the example representations are taken from the same source, such as the same finger.
  • the plurality of example representations may be obtained from a database and/or may be sampled.
  • the Bayesian network defines the quantities which are independent of one another and/or the quantities which are dependent upon one another and/or the quantities which are conditionally independent.
  • the Bayesian network may be obtained using one or more algorithms.
  • the algorithm used may be the NPC algorithm for estimating acyclic directed graph of Steck H., Hofmann, R., and Tresp, V. (1999). "Concept for the PRONEL Learning Algorithm", Siemens AG, Kunststoff and/or the EM-algorithm, S. L. Lauritzen (1995) "The EM algorithm for graphical association models with missing data”.
  • Computational Statistics & Data Analysis 19:191-201. for estimating the conditional probability distributions. The contents of both documents, particularly in relation to the algorithms they describe are incorporated herein by reference.
  • the considering of the expression of the first representation against a probability distribution may form the denominator in the use of the first consideration and second consideration.
  • the considering of the expression may involve finding the frequency for that expression value in the probability distribution.
  • the probability distribution based on the variation in the expression between different population representations may provide frequency of occurrence for different expression values.
  • the probability distribution is obtained from physically taken population representations.
  • the different population representations preferable all come from different source to one another.
  • the different population representations may be collected specifically for use in the method.
  • the different population representations may have been obtained for other purposes.
  • the different population representations may be in an existing database.
  • the probability distribution based on the variation in the expression between different population representations may have the form set out in respect of the first and/or second and/or third aspects of the invention and/or in respect of any of the features, options or possibilities set out in the next two paragraphs. hi the second aspect in particular, the following features, options and possibilities may apply to the manner in which the probability distribution based on the expressions of the population examples are arrived at.
  • the probability distribution may be derived from one or more factor probability distributions.
  • the probability distribution and/or one of the factor probability distributions maybe estimated from a database of population representations of an identifier from different sources.
  • the database may contain details of the distances between population representations and/or the comparison of different population representations using a vector to express each.
  • the database preferably contains a number of population representations that reflect the variation in representations for the identifier in the population or a subset thereof.
  • the database could be generated from the capture and processing of a large number of population representations from different sources.
  • the database is populated by the identification of corresponding features and links, ideally as triangles and lines, in a series of representations taken from a variety of sources.
  • the database could be formed by taking an existing database that includes population representations from different sources.
  • the existing database has its data processed to provide the data in a compatible format for the method.
  • the probability distribution and/or one of the factor probability distributions may be estimated from analysis of or from an existing probability distribution that details variation in one or more of the characteristics of the expression.
  • the characteristics may particularly be one or more or all of those that are discrete in nature, for instance the general pattern.
  • the probability distribution and/or one of the factor probability distributions may estimated from analysis of variation in one or more of the characteristics of the expression which are discrete in nature, other than general pattern.
  • a probability tree is preferred for such a probability distribution or factor probability distribution.
  • the probability distribution may be derived from one or more factor probability distributions.
  • the probability distribution and/or the factor probability distribution may be obtained from a Bayesian network estimated from a database of feature vectors extracted from different sources.
  • the Bayesian network may be obtained and/or estimated by considering a plurality of population representations taken from the different sources, such as different fingers.
  • the plurality of population representations may be obtained from a database and/or may be sampled.
  • the Bayesian network defines the quantities that are independent of one another and/or the quantities which are dependent upon one another and/or the probabilities which are conditionally independent.
  • the Bayesian network may be obtained using one or more algorithms.
  • the algorithm used may be the NPC algorithm for estimating acyclic directed graph of Steck H., Hofmann, R., and Tresp, V. (1999). "Concept for the PRONEL Learning Algorithm", Siemens AG, Kunststoff and/or the EM-algorithm, S. L. Lauritzen (1995) "The EM algorithm for graphical association models with missing data”.
  • Computational Statistics & Data Analysis 19:191-201. for estimating the conditional probability distributions. The contents of both documents, particularly in relation to the algorithms they describe are incorporated herein by reference.
  • the probability distribution based on the expressions of the example representations and/or the probability distribution based on the expressions of the population examples may be generated for a plurality of different numbers of selected features.
  • the number of selected features may be three or more and particularly three to twelve.
  • Preferably a probability distribution of each type is generated for each possible number of selected features used in the method.
  • the probability distributions may be generated in advance of the number of selected features in the first representation and/or second representation being known. Particularly when the different example representations of the second representation are from a different identifier to the second representation of an identifier which is being compared with the first representation of an identifier, the probability distribution for the example representations maybe generated in advance.
  • the probability distribution for the example representations may be generated in advance, particularly in respect of a method provided according to the third aspect of the invention.
  • the probability distribution for the population representations may be generated in advance, particularly in respect of a method provided according to the third aspect of the invention.
  • the probability distributions may be stored for future and/or repeated use.
  • the probability distributions may be generated after the number of selected features in the first representation and/or second representation is known, ideally with only probability distributions for that number of selected features being generated. After use, the probability distributions may be discarded, particularly if the next use is concerned with a different number of selected features.
  • the probability distributions may each be generated from a database of representations.
  • the probability distributions may be generated by processing the representations in the databases using a particular number of selected features.
  • the database of the expressions of the example representations and/or the database of the expressions of the population examples may be provided for a plurality of different numbers of selected features.
  • the number of selected features may be three or more and particularly three to twelve.
  • a database of each type is provided for each possible number of selected features used in the method.
  • the databases may be generated in advance of the number of selected features in the first representation and/or second representation being known. Once generated, the databases may be stored for future and/or repeated use.
  • the databases may be generated after the number of selected features in the first representation and/or second representation is known, ideally with only databases for that number of selected features being generated. After use, the databases may be discarded, particularly if the next use is concerned with a different number of selected features.
  • the databases may each be generated from a database of representations.
  • the databases may be generated by processing the representations in the databases using a particular number of selected features.
  • the use of the first consideration and second consideration maybe to evaluate a hypothesis.
  • the hypothesis may include, particularly as the first consideration, that the first representation and the second representation are from the same source.
  • the expressions of the first and/or second representations may be assumed to have the same discrete pieces of information.
  • the probability distribution may be based upon differences between expressions of the representations, particularly in terms of their continuous pieces of information.
  • the hypothesis may include, particularly as the second consideration, that the first representation and the second representation are from different sources, hi the second consideration, the expressions of the first and/or second representations maybe assumed to have the same discrete pieces of information.
  • the probability distribution may be based upon differences between expressions of the representations, particularly in terms of their continuous pieces of information.
  • the use of the first consideration and second consideration to evaluate a hypothesis may be the evaluation of a first hypothesis, for instance a prosecution hypothesis, and a second hypothesis, for instance a defence hypothesis.
  • the evaluation may be expressed as :
  • Jv 1 denotes a feature vector which comes from the second representation when conditioned on Hp and from an unknown source when conditioned on H d
  • Jv m denotes a feature vector originating from the first representation
  • the method may further include a check to see that the first and/or second and/or example and/or population representations or the expressions thereof, have the same discrete pieces of information.
  • the use of the first and second consideration may only proceed if they do.
  • a selection may be made of those population representations in the population representations available which have the same discrete pieces of information.
  • the selection may be represented through a probability tree.
  • the probability distribution based on the population representations uses only such selected population representations.
  • the using of the first consideration and the second consideration may provide a measure of the strength of link between the first representation and the second representation in the form of a likelihood ratio.
  • the method may include providing an indication as to whether the first representation is likely to have the same source as the second representation.
  • the indication as to whether the first representation is likely to have the same source as the second representation may be a yes or no indication and/or a quantified indication.
  • the likelihood ratio may be the quotient of two probabilities. One of the probabilities may relate to the probability that the first and second representations came from the same source. One of the probabilities may be that the first and second representations came from different sources.
  • the following features, options and possibilities may apply to the manner in which the indication and probability distributions provide a measure of the strength of link between the first representation and the second representation, particularly in the context of one embodiment of the invention.
  • the probability for the numerator in the likelihood ratio may be stated as:-
  • H p is the prosecution hypothesis, that is the two feature vectors originate from the same source
  • the probability for the numerator in the likelihood ratio may involve conditioning on H p (that is "the representations originate from the same source) and may further provide that , /V 5 c and fv m c become information extracted from the same representation of the same source (for instance, the same finger of the same person).
  • H p that is "the representations originate from the same source
  • fv m c information extracted from the same representation of the same source (for instance, the same finger of the same person).
  • the values of the information pieces that are discrete in nature coincide the probabilities in the numerator, particularly in the right-hand- side of the above equation, are added up.
  • the summation symbol may be removed from the formula when all the information pieces that are discrete in nature are present in the representation.
  • the information pieces that are continuous in nature may be the length of one or more of the links and/or the direction and/or orientation.
  • the distance may be obtained by subtracting term by term.
  • the result may be a vector containing nine quantities.
  • the result is preferably normalised. The sum of the squares of the distances from all the expressions, preferably vectors, may be considered to give a single value.
  • the probability for the denominator may be stated as:-
  • Jv m c continuous data of the feature vector from the mark fo m ⁇ : discrete data of the feature vector from the mark jv s c : discrete data of the feature vector from the suspect fv s d : discrete data of the feature vector from the suspect d(fv s c ,fo m c ) is the distance measured between the continuous data of the two feature vectors from the mark and the suspect
  • H d is the defence hypothesis, that is the two feature vectors originate from different sources.
  • the probabilities in the right-hand-side of this equation are added up.
  • the index of the summation is replaced by values of the information pieces that are not present.
  • the summation symbol is preferably removed when all the information pieces that are discrete in nature are present in the representation.
  • the information pieces that are continuous in nature may be the length of one or more of the links and/or the direction and/or orientation.
  • the distance may be obtained by subtracting term by term.
  • the result may be a vector containing nine quantities.
  • the result is preferably normalised. The sum of the squares of the distances from all the expressions, preferably vectors, may be considered to give a single value.
  • the following features, options and possibilities may apply to the manner in which the indication and probability distributions provide a measure of the strength of link between the first representation and the second representation, particularly in the context of a further embodiment of the invention.
  • the probability for the numerator of the likelihood ratio maybe stated as:
  • H p is the prosecution hypothesis, that is the two vectors originate from the same source.
  • the probability for the denominator may be stated as:
  • H d is the defence hypothesis, that is the two vectors originate from different sources.
  • the probability distributions are preferably a probability of occurrence distribution relative to the indication, preferably distance.
  • the likelihood ratio is preferably given by the value of the probability distribution for the same source divided by the value of the probability distribution for the different sources at a particular indication or distance value.
  • variation due to distortion and/or clarity issues is incorporated in the calculation of the numerator of the likelihood ratio.
  • the distance between the continuous information pieces is used, preferably in a feature vector.
  • the following features, options and possibilities may apply to the manner in which the indication and probability distributions provide a measure of the strength of match between the first representation and the second representation.
  • the probability for the numerator in the likelihood ratio may be stated as:
  • Jv means feature vector
  • c means continuous
  • d means discrete
  • m means mark
  • s means suspect and therefore:
  • Jv m c continuous data of the feature vector from the mark
  • Jv m d discrete data of the feature vector from the mark
  • Jv s c discrete data of the feature vector from the suspect
  • Jv s d discrete data of the feature vector from the suspect d(Jv s c , ⁇ > m c ) is the distance measured between the continuous data of the two feature vectors from the mark and the suspect
  • H p is the prosecution hypothesis, that is the two feature vectors originate from the same source
  • the probability for the numerator in the likelihood ratio may involve conditioning on H p (that is "the representations originate from the same source) and may further provide ⁇ haXjv s c and Jv 1n c become information extracted from the same representation of the same source (for instance, the same finger of the same person).
  • H p that is "the representations originate from the same source
  • Jv 1n c become information extracted from the same representation of the same source (for instance, the same finger of the same person).
  • the index of the summation is replaced by values of the information pieces that are not present.
  • the summation symbol may be removed from the formula when all the information pieces that are discrete in nature are present in the representation.
  • the probability for the denominator of the likelihood ratio may be stated as:
  • Jv means feature vector
  • c means continuous
  • d means discrete
  • m means mark
  • s means suspect and therefore:
  • Jv m c continuous data of the feature vector from the mark
  • Jv m d discrete data of the feature vector from the mark
  • Jv s c discrete data of the feature vector from the suspect
  • Jv s d discrete data of the feature vector from the suspect
  • H p is the prosecution hypothesis, that is the two feature vectors originate from the same source
  • the probabilities in the right-hand-side of this equation are added up.
  • the index of the summation is replaced by values of the information pieces that are not present.
  • the summation symbol is preferably removed when all the information pieces that are discrete in nature are present in the representation.
  • Bayesian networks are involved in the determination of the numerator and denominator of the likelihood ratio.
  • the Bayesian network(s) for the numerator are estimated using dedicated databases containing different representations obtained from the same source, ideally under several distortion and/or clarity conditions.
  • Bayesian network(s) for the denominator are estimated using dedicated databases containing representations from different sources, ideally different fingers and/or different people.
  • the method may include the use of Bayesian network(s) for providing information on general patterns within representations.
  • the calculation of the likelihood ratio may include consideration of the overall pattern of the representation and/or the region of the representation including the selected features.
  • the region may be the front and/or rear and/or side and/or middle of the representation.
  • the method includes repeating the method steps in respect of selections of different plurality of features.
  • Each repeat of the method may include selecting a plurality of features, preferably different in respect of at least one feature compared with other selections.
  • Each repeat may include linking each feature to one or more of the other features in that plurality of features.
  • Each repeat may include expressing information on the features and the link or links as a vector.
  • Each repeat may include comparing the vector with the probability distribution.
  • a series of feature and link data sets are expressed as vectors.
  • the plurality of vectors of the first representation are taken and compared with the probability distribution.
  • One or more of the vectors of the second representation maybe formed according to the same method as the vectors for the first representation.
  • the same number of features are involved in each repeat of the method steps for the first representations and/or second representations.
  • the same number of features are involved for each representation compared according to the method.
  • the representation may be considered using a plurality of features sets, preferably three features in each case. Ideally the feature set in each case is a triangle.
  • the representation may be considered using at least 1 feature set, preferably at least 5 feature sets, more preferably at least 10 feature sets. Between 10 and 14 feature sets, ideally triangles, may be used.
  • the representation may be considered using a plurality of feature sets in which one or more of the features are included in two or more feature sets.
  • a feature may provide the apex of a plurality of triangles.
  • the method of the present invention may only be applied to some of those features sets.
  • the number of features sets to which the first aspect of the invention is actually applied is between 5 and 10, ideally between 5 and 14.
  • a plurality of vectors of the first representation are compared with a plurality of probability distributions.
  • the comparison may provide an indication of the likelihood of the first representation and second representation coming from the same source.
  • the method may include providing an indication as to whether the first representation matches the second representation based upon the comparison of a plurality of vectors of the first representation with a plurality of vectors of the second representation.
  • the indication as to whether the first representation matches the second representation maybe a match's or does not match indication based upon the comparison of a plurality of vectors of the first representation with a plurality of vectors of the second representation.
  • the indication based upon the comparison of a plurality of vectors of the first representation with a plurality of vectors of the second representation, may provide a measure of the strength of a match, for instance a likelihood ratio.
  • the methods of the first and/or second and/or third aspects of the invention may be computer implemented methods.
  • Figure 1 is a schematic overview of the stages, and within them steps, involved in the comparison of a print from an unknown source with a print from a known source;
  • Figure 2a is a schematic illustration of a part of a basic skeletonised print
  • Figure 2b is a schematic illustration of the print of Figure 2a after cleaning and healing;
  • Figure 3 is a schematic illustration of the generation of representation data for the print of Figure 2b;
  • Figure 4 is a schematic illustration of a part of a print potentially requiring cleaning
  • Figure 5 is a schematic illustration of the neighborhood approach to cleaning according to the present invention
  • Figure 6 is a schematic illustration of a part of a print potentially requiring healing
  • Figure 7 is a schematic illustration of the neighborhood approach to direction determination, particularly useful in healing
  • Figure 8 is a schematic illustration of the application of a triangle to part of a print as part of the data extraction
  • Figure 9 is a schematic illustration of the application of a series of triangle to part of a print according to a further approach to the data extraction
  • Figure 10 is a schematic illustration of the application of Delauney triangulation applied to the same part of a print as considered in Figure 9;
  • Figure 11 is a representation of a probability distribution for variation in prints from the same finger and a probability distribution for variation in prints between different fingers;
  • Figure 12 shows the distributions of Figure 9 in use to provide a likelihood ratio for a match between known and unknown prints
  • Figure 13a illustrates minutia and direction information from a mark and a suspect
  • Figure 13b illustrates the presentation of the direction information in a format for comparison
  • Figure 13c illustrates the information of Figure 13b being compared
  • Figure 14 is a Bayesian network representation
  • a variety of situations call for the comparison of markers, including biometric markers.
  • Such situations include a fingerprint, palm print or other such marking, whose source is known, being compared with a fingerprint, palm print or other such marking, whose source is unknown. Improvements in this process to increase speed and/or reliability of operation are desirable.
  • the consideration of the unknown source fingerprint may require the consideration of a partial print or print produced in less than ideal conditions.
  • the pressure applied when making the mark, substrate and subsequent recovery process can all impact upon the amount and clarity of information available.
  • a representation of the fingerprint is captured. This may be achieved by the consideration of a photograph or other representation of a fingerprint which has been recovered.
  • the representation is enhanced.
  • the representation is processed to represent it as a purely black and white representation. Thus any colour or shading is removed. This makes subsequent steps easier to operate.
  • the preferred approach is to use Gabor filters for this purpose, but other possibilities exist.
  • This skeletonisation includes a number of steps.
  • the basic skeletonisation is readily achieved, for instance using a function within the Matlab software (available from The MathWorks Lie).
  • a section of the basic skeleton achieved in this way is illustrated in Figure 2a.
  • the problem with this basic skeleton is that the ridges 20 often feature relatively short side ridges 22, "hairs", which complicate the pattern and are not a true representation of the fingerprint. Breaks 24 and other features may also be present which are not a true representation of the fingerprint.
  • the basic skeleton is subjected to a cleaning step and healing step as part of the skeletonisation. The operation of these steps are described in more detail below and gives a clean healed representation, Figure 2b.
  • the data from it to be compared with the other print can be considered. To do this involves first the extraction of representation data which accurately reflects the configuration of the fingerprint present, but which is suitable for use in the comparison process.
  • the extraction of representation data stage is explained in more detail below, but basically involves the use of one of a number of possible techniques.
  • the first of the possible techniques involves defining the position of features 30 (such as ridge ends 32 or bifurcation points 34), forming an array of triangles 36 with the features 30 defining the apex of those triangles 36 and using this and other representation data in the comparison stage.
  • features 30 such as ridge ends 32 or bifurcation points 34
  • the positions of features are defined and the positions of a group of these are considered to define a center.
  • the center defines one apex of the triangles, with adjoining features defining the other apexes.
  • the representation data extracted is formatted before it is used in the comparison stage. This basically involves presenting the information characteristic of the triangles, quadrilaterals or other polygons being considered when the data is extracted in a format mathematically coded for use in the comparison stage. Further details of the format are described below.
  • the fingerprint has been expressed as representation data, it can be compared with the other fingerprint(s).
  • the comparison stage is based on different representation data being compared to that previously suggested. Additionally, in making the comparison, the technique goes further than indicating that the known and unknown source prints came from the same source or that they did not. Instead, an expression of the likelihood that they came from the same source is generated.
  • one or both of the two different models (a data driven approach and a model driven approach) both described in more detail below are used. Having provided an overview of the entire process, the stages and steps in them will now be discussed in more detail.
  • the existing interpretation considers the length of the ridge island 40. If the length is equal to or greater than a predetermined length value then it is deemed a true ridge island and is left. If the length is less than the predetermined length then the ridge island is discarded. In a similar manner, the length from the bifurcation point 43 to the ridge end 44 is considered. Again if it is equal to or greater than the predetermined length it is kept as a ridge with its attendant features. If it is shorter than the predetermined length it is discarded. This approach is slow in terms of its processing as the length in all cases is measured by starting at the feature and then advancing pixel by pixel until the end is reached. The speed is a major issue as there are a lot of such features need to be considered within a print.
  • Feature 52 is a part of separate data set, data set B, extending between crossing 54 and feature 52. All data sets formed by a feature at both ends, with both features being within the neighborhood 50 are discarded as being too short to be true features. AU data sets formed by a feature at one end and a crossing at the other are kept as far as the cleaning of that neighborhood is concerned. Thus feature 51 and its attendant data set are discarded (including the bifurcation feature 53) and feature 52 is kept by this cleaning for this neighborhood 50.
  • This approach can be used to address all ridge ends and attendant bifurcation features within the print to be cleaned.
  • the present invention also addresses the type of situation illustrated in Figure 6 where the basic skeleton shows a first ridge end 60 and a second 61, generally opposing one another, but with a gap 62 between them. Is this a single ridge which needs healing by adding data to join the two ends together? Or is this truly two ridge ends?
  • a neighborhood 70 is defined relative to a part of the print, hi this case, the part of the print includes a ridge end 71 and bifurcation 72. Also present are points where the ridges cross the boundaries of the neighborhood, crossings 73, 74, 75, 76. Again the crossings and features define a series of data sets, hi this case, ridge end 71 and crossing 73 define data set W; bifurcation 72 and crossing 74 define data set X; bifurcation 72 and crossing 75 define data set Y; and bifurcation 72 and crossing 76 define data set Z.
  • the direction of data set W is defined by a line drawn between ridge end 71 and crossing 73. A similar determination can be made for the direction of the other data sets.
  • the type of situation shown in Figure 6 is addressed by considering the direction of the ridge ending in first ridge end 60 and the direction of the ridge ending in second ridge end 61. If the two directions are the same, within the bounds of a limited range, and the separation is small (for instance, the gap falls with the neighborhood) then the gap is healed and the two ridge ends 60, 61 disappear as features as far as further consideration is required. If the separation is too large and/or if the directions do not match, then no healing occurs and the ridge ends 60, 61 are accepted as genuine.
  • the approach taken in the present invention allows faster processing of the cleaning and healing stage, in a manner which is accurate and is not to the detriment of subsequent stages and steps.
  • the necessary data from it to be compared with the other print can be extracted in a way which accurately reflects the configuration of the fingerprint present, but which is suitable for use in the comparison process.
  • first bifurcation feature 80, second 81 and ridge end 83 are present. These form nodes which are then joined to one another so that a triangle is formed. Extrapolation of this process to a larger number of minutia features gives a large number of triangles. A print can typically be represented by 50 to 70 such triangles. The Delaunay triangulation approach is preferred.
  • a series of features 120a through 1201 are identified within a representation 122.
  • a number of approaches can be used to identify the features to include in a series. Firstly, it is possible to identify all features in the representation and join features together to form triangles (for instance, using Delauney triangulation). Having done so, one of the triangles is selected and this provides the first three features of the series. One of the adjoining triangles to the first triangle is then selected at random and this provides a further feature for the series.
  • Another triangle adjoining the pair is then selected randomly and so on until the desired number of features are in the series, hi a second approach, a feature is selected (for instance, at random) and all features within a given radius of the first feature are included in the series. The radius is gradually increased until the series includes the desired number of features.
  • the position of each of these features is considered and used to define a centre 124.
  • this is done by considering the X and Y position of each of the features and obtaining a mean for each.
  • the mean X position and mean Y position define the centre 124 for that group of features 120a through 1201.
  • Other approaches to the determination of the centre are perfectly useable.
  • the new approach uses the centre 124 as one of the apexes for each of the triangles.
  • the other two apexes for first triangle 126 are formed by features 120a and 120b.
  • the next triangle 128 is formed by centre 124, feature 120b and 120c.
  • FIG. 10 illustrates the Delaunay triangulation approach applied to the same set of features.
  • Either the first, Delaunay triangulation, based approach or the second, radial triangulation, approach extract data which is suitable for formatting according to the preferred approach of the present process.
  • the data must be suitably mathematically coded to allow the comparison process and here a different approach is taken to that considered before.
  • the approach presents the extracted data in vector form, and so allows easy comparison between expressions of different representations.
  • a number of pieces of information are taken and used to form a feature vector.
  • the information is: the type of the minutia feature each node represents (three pieces of information in total); the relative direction of the minutia features (three pieces of information in total); and the distances between the nodes (three pieces of information in total).
  • the type of minutia can be either ridge end or bifurcation.
  • the direction, a number between 0 and 2 ⁇ radians, is calculated relative to the orientation, a number between 0 and ⁇ radians, of the opposing segment of the triangle as reference and so the parameters of the triangle are independent from the image.
  • feature vector may be expressed as:
  • FV [GP, Reg, (T 1 , A 1 , D 12 , T 2 , A 2 , D 13 , T 3 , A 3 , D 3 JJ
  • GP is the general pattern of the fingerprint
  • Reg is the region of the fingerprint the triangle is in
  • T 1 is the type of minutia 1 ;
  • a 1 is the direction of the minutia at location 1 relative to the direction of the opposing side of the triangle
  • D 12 is the length of the triangle side between minutia 1 and minutia 2;
  • T 2 is the type of minutia 2
  • a 2 is the direction of the minutia at location 2 relative to the direction of the opposing side of the triangle
  • D 23 is the length of the triangle side between minutia 2 and minutia 3;
  • T 3 is the type of minutia 3
  • a 3 is the direction of the minutia at location 3 relative to the direction of the opposing side of the triangle
  • D 3 j is the length of the triangle side between minutia 3 and minutia 1.
  • the features are recorded for all the triangles in the same order (either clockwise or anticlockwise).
  • a rule of starting with the furthest feature to the left is used, but other such rules could be applied.
  • the second data extraction approach described above is also suited to be mathematically coded using the vector format and so allow comparison with data extracted from other representations.
  • the pieces of information used to form the feature vector in this case are: the general pattern of the fingerprint; the type of minutia; the direction of the minutia relative to the image; the radius of the minutia from the centre or centroid; the length of the polygon side between a minutia and the minutia next to it; the surface area of the triangle defined by the minutia, the minutia next to it and the centroid.
  • the vector may be expressed as: FV-[GP, (T 1 , A 1 , R 1 , L 12 , S 1 J, , ⁇ T h A h R h L kk+! , S 1 J, , ⁇ T N , A N , R N , L N P S N )J where
  • GP is the general pattern of the fingerprint
  • T k is the type of mimitia i
  • a k is the direction of minutia Jc relative to the image
  • L kM ⁇ is the length of the polygon side between minutia k and minutia k+1;
  • S k is the surface area of the triangle defined by minutia k, k+1 and the centroid;
  • R k is the radius between the centroid and the minutia k.
  • region of the fingerprint is no longer considered.
  • the set of features can extend across region boundaries and so it is potentially not appropriate to consider one region in the vector.
  • the region could still be considered, however, and the expression set out below is a suitable one in that context, with the region designated Reg and the other symbols having the meanings outlined above. Note a separate region is possible for each minutia.
  • FV [GP, (T 1 , A 1 , R 1 , Re gl , L 12 , SJ,..., ⁇ T b A h R h Reg h L kM1 , S 1 J ⁇ 1 (T N , A N , R N , Reg N , L NJ , SJJ
  • the likelihood ratio is the quotient of two probabilities, one being that of two feature vectors conditioned on their being from the same source, the other two feature vectors being conditioned on their being from different sources.
  • Feature vectors obtained according to the first data extraction approach and/or second extraction approach described above can be compared in this way, the differences being in the data represented in the feature vectors rather than in the comparison stage itself.
  • the feature vector fv contains the information extracted from the representation and formatted.
  • the addition of the subscript s to this abbreviation denotes that a feature vector comes from the suspect, and the addition of the subscript m denotes that a feature vector originates from the crime.
  • the symbol fv s then denotes a feature vector from the known source or suspect, and fv m denoted the feature vector originated from an unknown source from the crime scene.
  • discrete quantities which may include general pattern, region, type, and other data
  • continuous quantities which may include the distances between minutiae, relative directions and other data.
  • the data driven approach involves the consideration of a quotient defined by a numerator which considers the variation in the data which is extracted from different representations of the same fingerprint and by a denominator which considers the variation in the data which is extracted from representations of different fingerprints.
  • the output of the quotient is a likelihood ratio.
  • the feature vector for the first representation, the crime scene, and the feature vector for the second representation, the suspect are obtained, as described above.
  • the difference between the two vectors is effectively the distance between the two vectors. Once the distance has been obtained it is compared with two different probability distributions obtained from two different databases. hi the first instance, the probability distribution for these distances is estimated from a database of prints taken from the same finger.
  • a large number of pairings of prints are taken from the database and the distance between them is obtained. This involves a similar approach to that described above.
  • Each of the prints has data extracted from it and that data is formatted as a feature vector. The differences between the two feature vectors give the distance between that pairing. Repeating this process for a large number of pairings gives a range of distances with different frequencies of occurrence. A probability distribution reflecting the variation between prints of the same figure is thus obtained.
  • the database would be obtained from a number of prints taken from the same finger of the suspect. However, the approach can still be applied where the prints are taken from the same finger, but that finger is someone's other than the suspect.
  • This database needs to reflect how a print (more particularly the resulting triangles and their respective feature vectors) from the same finger changes with pressure and substrate.
  • This database is formed from a significant number of sets of information, each set being a large number of prints taken from the same finger under the full range of conditions encountered in practice.
  • the database is populated by the identification, by an operator, of corresponding triangles in several applications of the same finger.
  • a smaller set of prints can be processed as described above, distortion functions can then be calculated.
  • the prefer method is thin plate splines, but other methods exist.
  • the distortion function can then be applied to other prints to simulate further sets of data.
  • the probability distribution for these distances is estimated from a database of prints taken from different fingers. Again a large number of pairings of prints are taken from the database and the distance between them obtained.
  • the extraction of data, formatting as a feature vector, calculation of the distance using the two feature vectors and determination of the distribution is performed in the same way, but uses the different database.
  • This different database needs to reflect how a print (more particularly the resulting triangles and their respective feature vectors) from a number of different fingers varies between fingers and, potentially, with various pressures and substrates involved. Again , the database is populated by the identification, by an operator, of triangles in the various representations obtained from the different fingers of different persons.
  • the numerator may thus be thought of as considering a first representation obtained from a crime scene or an item linked to a crime, against a second representation from a suspect through an approach involving: taking and/or generating a number of example representations of the second representation; considering the example representations as a number of triangles; considering the value of the feature vector for a given triangle in respect of each of the example representations; obtaining the feature vector value of the first representation; forming a probability distribution of the frequency of the cross-differences of different feature vector values for a given triangle between example representations; comparing the difference of the feature vector value of the first representation and the feature vector value of the second representation with the probability distribution.
  • the denominator may thus be thought of as considering the second representation obtained from a suspect against a series of representations taken from a population through an approach involving: taking or generating a number of example representations of representations taken from a population; considering the example representations as a number of triangles; considering the values of the feature vectors in respect of each of the example representations; forming a probability distribution of the frequency of differences between the feature vector of the first representation and the different feature vector values from the example representations; obtaining the feature vector value of the second representation; comparing the difference between the feature vector value of the first representation and the feature vector value of the second representation with the probability distribution.
  • fv m c continuous data of the feature vector from the mark
  • fv m d discrete data of the feature vector from the mark
  • fv s c discrete data of the feature vector from the suspect
  • fv sd discrete data of the feature vector from the suspect
  • d(fv s c , ⁇ ' m c ) is the distance measured between the continuous data of the two feature vectors from the mark and the suspect
  • H p is the prosecution hypothesis, that is the two feature vectors originate from the same source.
  • d(fv s c ,/v m c ) denotes a distance between the continuous quantities of the feature vectors for the prints.
  • the continuous quantities in a feature vector are the length of the triangle sides and minutia direction relative to the opposite side of the triangle.
  • This distance measure is computed by first subtracting term by term. The result is a vector containing nine quantities. This is then normalised to ensure that the length and angle are given equal weighting. By taking the sum of the squares of the distances from all the feature vectors considered in this way a single value is obtained.
  • /v means feature vector, c means continuous, d means discrete, m means mark and s means suspect, and therefore:
  • Jv m c continuous data of the feature vector from the mark
  • Jv m d discrete data of the feature vector from the mark ⁇ > s c : discrete data of the feature vector from the suspect fv s d : discrete data of the feature vector from the suspect d(fv s c , ⁇ > m c ) is the distance measured between the continuous data of the two feature vectors from the mark and the suspect
  • H d is the defence hypothesis, that is the two feature vectors originate from different sources.
  • the subscript in the summation symbol means that the probabilities in the right-hand-side of this equation are added up for all the cases where the values of the discrete quantities of the features vectors coincide. In some occasions some or all of the discrete variables are present in the fingermark. For these cases the index of the summation is replaced by values of the quantities that are not present. The summation symbol is removed when all discrete quantities are present in the fingermark.
  • the probability distribution for distances d(fv s c , fv m c ) can be estimated from a reference database of fingerprints. This database needs to reflect how much variability there is in respect of all prints (again more particularly the resulting triangles and their feature vectors) between different sources. This database can readily be formed by taking existing records of different source fingerprints and analysing them in the above mentioned way.
  • Pr(/v ra JH d ) is a probability distribution of discrete variables including general pattern. A probability distribution for general pattern was computed based on frequencies compiled by the FBI for the National Crime Information Center in 1993.
  • a probability distribution for the remaining discrete variables can be estimated from a reference database using a number of methods.
  • a probability tree is preferred because it can more efficiently code the asymmetry of this distribution, for example, the number of regions depends on the general pattern.
  • a probability for the numerator of the likelihood ratio is computed using the following formula:
  • H p is the prosecution hypothesis, that is the two vectors originate from the same source.
  • the probability for the numerator is computed using the following formula:
  • H d is the defence hypothesis, that is the two vectors originate from different sources.
  • a feature vector is first considered against another feature vector in terms of only part of the information it contains, hi particular, the information apart from the minutia direction can be compared.
  • the data set included in one of the vectors is fixed in orientation and the data set included in the other vector with which it is being compared is rotated. If the data set relates to three minutia then three rotations would be considered, if it related to twelve then twelve rotations would be used. The extent of the fit at each position is considered and the best fit rotation obtained. This leads to the association of minutiae pairs across both feature vectors. hi respect of the best fit rotation, in each case, the process then goes on to compare the remaining data in each set, the minutia direction.
  • the minutiae directions are made independent of the orientation of the print on the image.
  • the approach taken on direction is described with reference to Figurel3a through 13 c.
  • a mark set of minutia 200 and a suspect set of minutia 202 are being considered against one another.
  • Each set is formed of four minutia, 204a, 204b, 204c, 204d and 206a, 206b, 206c, 206d respectively.
  • the allocation of the minutia reference numerals reflects the suggested best match between the two sets arising from the consideration of the minutia type, length of the polygon sides between minutia, surface of the polygon defined by the minutia and centroid.
  • Each of the minutia has an associated direction 208a, 208b, 208c, 208d and 210a, 210b, 210c, 21Od respectively.
  • a circle 212, 214 of radius one is taken.
  • a radius 216 for each of the minutia directions see Figure 13b.
  • To the suspect circle 214 is added a radius 218 from each of the minutia directions, Figure 13b.
  • Rotation of one of the circles relative to the other allows the orientation of the minutia to be brought into agreement, according to the set of the pairs of minutiae that were determined before, Figure 13 c, and allows the extent of the match in terms of the minutiae directions for each pair of minutiae to be considered. In the illustrated case there is extensive agreement between the two circles and hence between the two marks in respect of the data being considered. hi effect, the match between the polygons is being considered in terms of the minutia type, distance between minutia, radius between the minutia and the centroid, surface area of the triangle defined between the minutia and the centroid and minutia direction. All of these considerations serve to compliment one another in the comparison process. One or more may be omitted, however, and a practical comparison be carried out.
  • the comparison provides a distance which can be considered against the two distributions in the manner previously described with reference to Figures 11 and 12 below.
  • Various means can be used for computing the distance, including algorithms (such as Euclidean, Pearson, Manhattan etc) or using neural networks.
  • the databases used to define the two probability distributions preferably reflect the number of minutia being considered in the process. Thus different databases are used where three minutia are being considered, than where twelve minutia are being considered.
  • the manner in which the databases are generated and applied are generally speaking the same, variations in the way the distances are calculated are possible without changing the operation of the database set up and use. Equally, it is possible to form the various databases from a common set of data, but with that data being considered using a different number of minutia to form the database specific to that number of minutia.
  • the databases maybe generated in advance in respect of the numbers of minutia expected to be considered in practice, for instance 3 to 12, with the relevant databases being used for the number of minutia being considered in a particular case, for instance 6. Pre-generation of the databases avoids any delays whilst the databases are generated.
  • a mark may be best considered using six minutia and the desire to consider this mark would lead to the database being generated for six minutia from the basic database of fingerprint representations by considering that using six minutia.
  • the data set size which needs to be stored would be reduced as a result.
  • fv means feature vector
  • c means continuous
  • d means discrete
  • m means mark and s means suspect
  • fv m c continuous data of the feature vector from the mark
  • fv m d discrete data of the feature vector from the mark
  • fv s c discrete data of the feature vector from the suspect
  • fv s d discrete data of the feature vector from the suspect
  • d(fv s c , fv m c ) is the distance measured between the continuous data of the two feature vectors from the mark and the suspect
  • H p is the prosecution hypothesis, that is the two feature vectors originate from the same source
  • the continuous quantities when conditioning onyi ⁇ eca ⁇ jv m c become measurement of the same finger and person.
  • the subscript in the summation symbol means that the probabilities in the right-hand-side of the equation are added up for all the cases where the values of the discrete quantities of the features vectors coincide. In some occasions some or all of the discrete variables are present in the fingermark. For these cases the index of the summation is replaced by values of the quantities that are not present. The summation symbol is removed when all discrete quantities are present in the fingermark.
  • the probability distribution for /V 5 c is computed using a Bayesian network estimated from a database of prints taken from the same finger as described above.
  • Many algorithms exists for estimating the graph and conditional probabilities in a Bayesian networks are the NPC algorithm for estimating acyclic directed graph, see Steck H., Hofmann, R., and Tresp, V. (1999).
  • the EM algorithm for graphical association models with missing data.
  • Computational Statistics & Data Analysis, 19:191-201. for estimating the conditional probability distributions. The contents of both documents, particularly in relation to the algorithms they describe are incorporated herein by reference.
  • Jv s c discrete data of the feature vector from the suspect
  • Jv sd discrete data of the feature vector from the suspect d(Jv s c ,Jv m c ) is the distance measured between the continuous data of the two feature vectors from the mark and the suspect
  • H d is the defence hypothesis, that is the two feature vectors originate from different sources.
  • the subscript in the summation symbol means that the probabilities in the right-hand-side of equation are added up for all the cases where the values of the discrete quantities of the features vectors coincide, hi some occasions some or all of the discrete variables are present in the fingermark. For these cases the index of the summation is replaced by values of the quantities that are not present. The summation symbol is removed when all discrete quantities are present in the fingermark.
  • the probability distribution in the first factor of the right hand side of equation above is computed with a Bayesian network estimated from a database of feature vectors extracted from different sources.
  • Bayesian networks There are many methods for estimating Bayesian networks as noted above, but the preferred methods are the NPC-algorithm of Steck et ah, 1999 for estimating an acyclic directed graph and/or the EM-algorithm of Lauritzen, 1995 for the conditional probability distributions.
  • the second factor Pr(/v m jH d ) is estimated in the same manner as described for the data-driven approach above.
  • the numerator Given a feature vector from know source jv s and from an unknown source ⁇ > m , the numerator is given by the equation and is calculated with a Bayesian network dedicated for modelling distortion. The second factor in the denominator is calculated in the same manner as with the data-driven approach. The first factor is computed using Bayesian networks. A Bayesian network is selected for the combination of values off m d which is then use for computing a probability V ⁇ fv m Jfv m d JH[ d ). This process is repeated for all values in the index of the summation. The likelihood ratio is then obtained by computing the quotient of the numerator over the denominator.
  • Bayesian networks for calculating the likelihood ratio, but the invention is not limited to it.
  • Another example is estimating one Bayesian network per general pattern.
  • This invention can also be used for more than three minutiae by defining suitable feature vectors.
  • a Bayesian network representation to specify a probability distribution.
  • a Bayesian network is an acyclic directed graph together with conditional probabilities associated to the nodes of the graph. Each node in the graph represents a quantity and the arrows represent dependencies between the quantities.
  • Figure 14 displays an acyclic graph of a Bayesian network representation for the quantities X, Y and Z. This graph contains the information that the joint distribution of X, Y an d Z is given by the equation
  • ⁇ ,y,z) ⁇ (x)v(y ⁇ x) ⁇ (z ⁇ y) for all x,y ⁇

Abstract

L'invention concerne un procédé permettant de comparer une première représentation d'un identificateur à une seconde représentation d'un identificateur. Le procédé consiste à utiliser une expression de la première représentation, telle qu'une empreinte, et à estimer l'expression de la première représentation par rapport à une distribution de probabilité fondée sur la variation de l'expression entre diverses représentations exemplaires des secondes représentations, de manière à obtenir une première estimation. Le procédé consiste également à estimer l'expression de la première représentation par rapport à une distribution de probabilité fondée sur la variation de l'expression entre diverses représentations de populations, de manière à obtenir une seconde estimation. L'utilisation des première et seconde estimations permet d'obtenir une mesure de comparaison entre les première et seconde représentations.
EP05801600A 2004-10-14 2005-10-14 Analyse statistique dans une reconnaissance de motifs, notamment dans la reconnaissance d'empreintes Withdrawn EP1800241A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB0422784A GB0422784D0 (en) 2004-10-14 2004-10-14 Improvements in and relating to identifier comparison
GB0502900A GB0502900D0 (en) 2005-02-11 2005-02-11 Improvements in and relating to identifier comparison
US11/083,579 US7369700B2 (en) 2004-10-14 2005-03-18 Identifier comparison
PCT/GB2005/003961 WO2006040573A1 (fr) 2004-10-14 2005-10-14 Analyse statistique dans une reconnaissance de motifs, notamment dans la reconnaissance d'empreintes

Publications (1)

Publication Number Publication Date
EP1800241A1 true EP1800241A1 (fr) 2007-06-27

Family

ID=35659000

Family Applications (1)

Application Number Title Priority Date Filing Date
EP05801600A Withdrawn EP1800241A1 (fr) 2004-10-14 2005-10-14 Analyse statistique dans une reconnaissance de motifs, notamment dans la reconnaissance d'empreintes

Country Status (4)

Country Link
EP (1) EP1800241A1 (fr)
AU (1) AU2005293303A1 (fr)
CA (1) CA2583990A1 (fr)
WO (1) WO2006040573A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2626621A1 (fr) 2007-03-23 2008-09-23 Forensic Science Service Ltd. Ameliorations apportees a la modelisation
GB0819069D0 (en) 2008-10-17 2008-11-26 Forensic Science Service Ltd Improvements in and relating to methods and apparatus for comparison

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2006040573A1 *

Also Published As

Publication number Publication date
AU2005293303A1 (en) 2006-04-20
WO2006040573A1 (fr) 2006-04-20
CA2583990A1 (fr) 2006-04-20

Similar Documents

Publication Publication Date Title
US20150227818A1 (en) Methods for comparing a first marker, such as fingerprint, with a second marker of the same type to establish a match between the first marker and second marker
US20160104027A1 (en) Identifier investigation
US7369700B2 (en) Identifier comparison
US20100080425A1 (en) Minutiae-based template synthesis and matching
US20040199775A1 (en) Method and device for computer-based processing a template minutia set of a fingerprint and a computer readable storage medium
US11315358B1 (en) Method and system for detection of altered fingerprints
Soleymani et al. A hybrid fingerprint matching algorithm using Delaunay triangulation and Voronoi diagram
Fatehpuria et al. Acquiring a 2D rolled equivalent fingerprint image from a non-contact 3D finger scan
JP2014529816A (ja) 虹彩認識による識別
WO2006040564A1 (fr) Extraction et comparaison de caracteristiques pour la reconnaissance d’empreintes digitales et de paumes
WO2006040573A1 (fr) Analyse statistique dans une reconnaissance de motifs, notamment dans la reconnaissance d'empreintes
Rabasse et al. A new method for the synthesis of signature data with natural variability
WO2006085094A1 (fr) Ameliorations concernant la recherche d'identificateurs
EP1800239A1 (fr) Procede pour ameliorer la qualite de la squelettisation d'une image d'empreinte digitale
Szymkowski et al. A novel approach to fingerprint identification using method of sectorization
JP3110167B2 (ja) 階層型ニューラルネットワークを用いた物体認識方式
US20060083413A1 (en) Identifier investigation
Szczepanik et al. Fingerprint recognition based on minutes groups using directing attention algorithms
US8983153B2 (en) Methods and apparatus for comparison
US11074427B2 (en) Method for reconstructing an imprint image from image portions
Iwasokun et al. Fingerprint Individuality Model Based on Pattern Type and Singular Point Attributes
Gangapure et al. 2.5 D Palmprint Recognition using Signal level Fusion and Graph based Matching
Gavrilova Computational geometry and image processing in biometrics: On the path to convergence
Dommeti et al. Revolutionizing Fingerprint Forensics: Regeneration and Gender Prediction with Gabor Filters, Otsu's Technique, and Deep Learning
Nagaraj Offline Signature Authentication System Using Machine Learning and Android Interface

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20070411

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR

17Q First examination report despatched

Effective date: 20070724

DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20180501