WO2016131083A1

WO2016131083A1 - Identity verification. method and system for online users

Info

Publication number: WO2016131083A1
Application number: PCT/AU2016/000048
Authority: WO
Inventors: Kelly SPENDLOVE; Julie SPRINGETT
Original assignee: S2D Pty Ltd
Priority date: 2015-02-20
Filing date: 2016-02-19
Publication date: 2016-08-25

Abstract

A method of verifying the identity of a user in an online system is disclosed. An embodiment of the method includes capturing a current photo graphic image of a user via a user interface provided on a user device; receiving, via the user interface, a photographic image or digital copy of at least one independently issued identity verification document, wherein each of the at least one independently issued identity verification document includes user identification data items and at least one of the at least one independently issued identity verification document includes a photographic image of the user; extracting a plurality of user identification data items and a photographic image of the user from the at least one independently issued identity verification document,, calculating an identity Confidence score, wherein the calculation is based upon matching the extracted plurality of user identification data items and photographic image of the user with the captured photographic image of (he user; and generating a verification status for the user based upon the calculated identity confidence score wherein the user is assigned a verified status if the calculated identity confidence score is within a first predetermined verification range

Description

ί

IDENTITY VERIFICATION. METHOD AND SYSTEM FOR ONLINE USERS

This international patent application claims priorit front Australian provisional patent application ij ber 2015900601 ftled on 20 February 2015 the contents of which ate take to he incorporated herein by this reference.

TECHNICAL FIELD

[00011 The present invention relates to online computer systems and communities. In. a particular form the present in vention relates to a Method for verifyin the^' identity of a user in ail online system, such as an on-line dating system.

BACKGROUND

[0002:] In many online systems users are required to register identit details with tire system such as name, age. gender, address, along with an online alias or profile name and optionally an. image or avatar for display to other users of the system. In man cases the identity data may only be required for use by baefeend systems and thus some or eve all of the identity data is hidden from other users to protect the pri vacy or true identity of the user:. Typically the sy stem will allow users to decide when and to whom to reveal identit information, or alternatively only selectively reveal information based on certai scenarios. For example man online systems are subscription based services in which case the user may be required to supply credit card details which are only for use by backend payment systems. Similarly in. an online dating system a backend system may use identity and other private or -confidential .information provided, by the user to identify other users with whom the user may be in terested in dating, for example, if a user indicate a interest in contacting or dating another user, the system may then reveal email and phone numbers to facilitate communication, between the two users,

[0003] One issue that frequently arises in. online systems and online: communities is that a user has no way of telling if identi ty inforniation regarding a specific user is true. M ost online systems accept the identity information at face value and do not attempt to verify that the information is true or e ven that the identity corresponds to a real person. This is a particu la issue in social media and online datin environments, where users can deliberately enter incorrect identity information, in online dating systems this erode trust if user entered details are- later found to be incorrect (for example if an incorrect age or olde r pho to is added). More generally the inability'' of users to check, or even h ave some amount of confidence in, the identity of a user creates a ri sk if the intent of the user(s) who enter incorrect data or fictitious identities is illegal or exploitative in n ature. ?

[0004] There is th us a need to provide methods- and sy stems to veri fy the identity of a user in an online system or at least provide a degree of confidence that a user is who they claim to be.

SUMMARY

[OOOS] According to a first aspect of an embodiment of fee-disclosure, there is provided method of verifying the ^'identity of a user in an online system, the method i ncluding:

capturing a current photographic image of a user v ia a user interface provided on a user device;

receiving, via the user interface, a photographic image or digital copy of at least one independently issued i dentity verification document, wherein each of the at least one

independently issued identity verification document includes user identification data; items and at least one of the at least one independently issued identify verification document includes a photographic image of the user;

extracting a pluralit of user identification data items and a photographic image of the user .from the at least one independently issued identit verificatio document;

calculating an identity confidence score, wherein the calculation is based upon matching the extracted, pluralit of user identification data items and photographic image of the user wit the captured photographi c image of the u ser ; and.

generating a verification : "status for the user based upo the calculated identity confidence score wherein the user is assigned a verified status if the calculated identify confidence score is within a first predetermined verification range

[0006] in a further form, the user interface is configured to guide the use to capture the current photograpliie image within a: predefined region of a display of the user device,

[0007] in a further form, the method further includes entering, via a user interface, a pluralit of user identification data items by a user, and wherein eaicttlation of the identit confidence score is based upon matching the extrac ted plurality of user identi fication data items with, the user entered pluralit of user identification^' data items, and matching the extracted photographic image of the user and the received current photographic image of the user.

[0008] in a further form, the at least one independently issued identity verification; document includes at least one Government issued identity verification document including a photographic linage of the user.

10009] In a further form, the user interface is configured to receive the at least one independently issued identity verification document by capturing at least one image of the at least one independently issued identity^••verification document, and extracting a plurality of user identification data items is pertlmned by performing optical character recogni tor! on the at least one image of the at least one independently issued identity verification document.

(0010}^' In a further form, the user interface is configured to receive a document type- of the at least one independently issued identity verification document to be captured, and die user is guided to capture the document within a container region of the captured image, and the extraction step is performed using a predefined pixel map tor the received document step wherein the predefined pixel map defines pixel regions where individual identificatio data items and the photographic^' image o f the user are located within the image,

[00_.11] hi a further form, the least one independently issued identity verification document includes a credit card issued to the user, and extracting a plurality of user identification data i tems includes ex tracting credit card billing informatio from the at least one image of the credit card and providing the extracted credit card billing information to a billing system for the online system.

[0012] In a further form, if the ealcuiated identity confidence score is within a second predefined verification range, the user interface is configured to prompt the user for additional independently issued identity-verification documents, and the steps of extracting, calculating: and verifying are re-performed incorporating the additional independently issued identity verification documents.

[00.13] In a further form, a facial detection system detects the location of a face within the captured and/or received photographic image.

10014] In a further form,, the facial detection system is based upon the Viola-Jones object detection frame work..

[0015 j In a further form, matching the extracted photographic image of the user and the captured current photographic image of the user includes detecting the location and size of a face in each photographic image and performing facial recognition o each detected face to estimate a.

plurality of facial features for each face, and a match score is obtained based o the correlation between the facial features in each face.

(001 ] In a further form, performing facial recognition is based on performing a principal component analysis (PCA) to extract facial features, and the features are then classified to determine if the face extracted from the photographie image of the user and face extracted from the received current photographic image of the user belong to the same class. [0017] In a further form, the user identification data items includes age and gender, and a classifier is used to estimate the age and gender of the user from the captured and/or received photographic image for comparison with the age and gender entered by the user.

(0 1 &} In a further form, performing facial recognition, includes generating a nodal map of the face and comparing the nodal maps for each face.

[001 j I a further form, the user identification data items includes age and gender, and the nodal map is used to estimate the age and gender of the user which is compared with the age and gender entered by the user.

[0020] In a further form, the user identification data items includes a user address, and calculating an identity confidence: score further includes obtaining an IP address of the user and estimating an approximate location based upon the IP addres and cosrtparing with an address entered b the user.

1 0 11 In a further form, generating a verification status includes generating a digital identity verification waterma k for: indicating to other users of the online system the verification status of the user, in one emhodiruent, a digital, identity verification wittermark is ^issued" to a use who has been verified,

[0022] According to a second aspect of the present invention, there is provided a registration system for an online system wherein the registration system, rises: the method of the first aspect to verify the identity of a new user,

[00231 According to third aspect of the present in vention, there is provided computer readable medium including instructions for causing .a computer to perform the method of the first aspect,

| 002 [ According to a fourth aspect of the present in vention, there is provided a user device configured for use in .system, fo verifying the identity of a user i a online system, the device including a camera, a cormmuiications interface, a memory and a processor, wherein the processor is configured to:

provide a user interface to capture a current photographic image of a user;

receive^ via the user interface, a photographic image or digital copy of at least one independently issued identity verification document, wherein each of the at least one

independently issued identity verification document includes user identification data items and at least one of the at least one independently issued identity verification document inc ludes a photographic image of the user; send the current photographic image of the user and the photographic image or digital copy of at least one i ndependen tl y issued identity verification documen t to an identity verification server via the communications interface, wherein the identity verification server is configured to: extract a plurality of user identification data items and a photographic image of the user from the at least one independently issued identity verification document;

calculate an identity confidence score, wherein the calculation is based upon matching the extracted plurality of user identification data i tems and photographic image of the user wit the captured photographic image of the user; and

generate a verification status for the user based upon the .calculated, identity confidence score wherein the user is assigned a verifi ed status if the calculated identity confidence score is within a first predetermined verification range.

[0025] According to a fifth aspect of the: present: invention, there is provided an identity verification, server for an online system including a communications interface, memory and a processor, wherein the processor is configured to:

receive via the communications interface, a current photographic image of a user and a photographic image, or digital cop of at least one independently issued identit verification document from a user device;

extract a pl urali ty of user identification data items and a photographic image of the user from the at least one independently issued identity verification document;

calculate an identity confidence, score, wherein the calculation is. based upon matching the ex trac ted plurality of user identification dam i tems and photographic image of the user wi th the captured photographic image of the user; and

generate a verification status for the user based upon the calculated identity confidence score wherein me user is assigned a veri fied status if the calculated identity confidence score is within first predetermined verification, range.

1 0261 According yet another aspect of the present invention, there is provided a method of registering a user for an online membership system the method including:

activating an imaging device during a user registration process to obtain a first image file including information encoding an image of the user;

recei ing a second image file including -information encoding an image of at least one independently issued identity verification document;

processin the first and second! i mage files to determine a verification status for the user according to a correlation between at least one attribute of the first image and at least one associated attribute of the second image; and

assigning the verification status to a registration account for the- user. BRIEF BESCRIPTiQS OF DRAWINGS

[0027] Embodiments of the present invention will be discussed with reference to the

accompanying drawings wherein:

[0028] Figure 1 is a flow chart of a method to verify the identity of a user in an online system according to an embodiment;

[0029] Figure 2 is Ά flow chart illustrating the steps in retrieving and completing an enrolment (or registration) shown in Figure 2;

1 030] igure 3 is a flow chart illustrating the exchange of data, between different databases during the enrolment process;

[00311 Figure 4 il lustrates an output range of the identity confidence score according to aa embodmicDt with three predefined ranges;

[ 00321 Figure 5 is: a flow chart of a method of preparing a captured image for facial matching according to an embodiment;

[0033] Figure 6 is a flow chart of a method for extracting user identification data items from an independently issued identity verification document;

[0034] Figure 7 is a flow chart of a method for capturing an image of an independently issued verification document according to an embodiment;

[0035] Figure 8 is a. flow chart of a method for capturing a current image of^" a user according to an embodiment;

[0036] Figure 9 is a flow chart of a method for capturing an image of a credit card using the OCR system according to an embodiment;

[0037] Figure 10 is a flow chart of a method for processing a payment from the use to a bank merchant API according to an embodiment;

[0038] Figure 11 illustrates a series of images showing the steps to identify nodal points of a face for facial matching; [003 ] Figure 12 is an illustration, showin three identified images and results fern the facial matching database;

[0040] Figure 13 is a plot comparin PCA and ICA used far feature extraction for race classification. according to an embodiment;

[0041 ] Figure 14 i flow chart of a cascaded SVM classifier structure according to an etnhodiment;

[0042] Figure 15 illustrates extraction of features from a face and .subsequent representation in a column matrix according to an embodiment;

[0043] Figure 16 A is a flowchart of the OCR process applied to a captured image according to an embodiment;

[0044] Figure 16B is an original captured image of a driver's licence according to an embodiment;

[0045] Figure 16G illustrates the aligned (rotated) image of Figure I6B;

[00461 Figure 16D illustrates segmentation erf the aligned image of Figure 1 C;

[0047] Figure 16E illustrates background removal of the segmented image of Figure 16D; and

[0048] Figure 17 is a schematic diagram of a computing apparatus,

[0049] In the following .description, like reference characters designate like or corresponding parts throughout the figures.

DESCRIPTION OF EMBODIMENTS

1005 )] Embodiments of a method and system to verify the identity of a user in an online system will now be described. Embodiments of the method and system perform verification by capturing a current photographic image of a user via a. user interface provided on a user device, and matching this against a photograph and identity information included in one or more

independently issued identity verification documents such as a driver's licence, a passport, a bank statement or bill, or some other document such as an employer issued photo ID, In a preferred arrangement, the "'current" photographic image of the user is cap tured during a registration process. [0051 J The independently issued identity verification documents ma be provided in the form of a digital copy of a document or as images, such as an image file encoding an image of the document captured by the user device or from a scanner. An independently issued identity verification document may include a photograph of the user and identit information, or more specifically user identification data items, which are extracted from the image file or digital copy, such as by using various optical character recognition (OCR), pattern and textual matching, fecial detection and facial feature analysis systems. The user identification data items may include identity data such as name, address, gender, date of birth, etc, and this extracted information may be used to register the user into the online system. As will be explained in mor detail below, an independently issued identity verification document may include a driver's licence, passport student identification card or the like.

[0052] The captured current, photograph of the user can then be matched or compared with the extracted photograph and user data items to verif that the user is a "real" perso . This matching may be performed solely by comparing the captured photograph with the information extracted from the independently issued identity verification documents). Facial analysis of the use supplied image (that is, the captured cia ent photographic image of the user) can be performed to estimate user identity information such as current age and gender, and these can be compared to identity data items extracted from the identit verification dc itment(s}_s and the extracted photo therefrom, to enable calc lation of an. identit confidence score. Additionally,_, the user may be asked, via the user interface, to supply user i dentification data, items, in which case the .matching process can compare the user entered identity information, with the extracted identity information, and this can.be included in the calculation of the identity confidence score.

[0053] A verification status for the user can then be generated based upon the calculated identity confidence score which can then be used to display or provide verification status informatio to other users in the online system. For example the user may be assigned a verifi ed status if the calculated, identity confidenc score exceeds a threshold value (or. is within a predetermined verification range, which may be an open ended range). This verification status could be pro vided as, for example, a digital watermark or icon that is associated with the user's aliases and profiles, and displayed to other users so that the user can continue to use their aliases or profile whilst retaining privacy of their true identi ty to others.

[0054] To further illustrate the invention a detailed embodiment will now be described, However it is to be understood that this embodiment may be varied and modified to suit system

requirements,. [0055 ] Figure 1 is a. flow chart of a method to verify the identit of a user in an online system according to an embodiment In this embodiment the method is implemented as a cloud-hosted product i which a user of a user device 101 communicates with a server over the internet using a browser based user interface 102. The user interface: 102 ma be an "app" on a smart phone or tablet, or a clien portal or website accessed on a PC or laptop. The user interface 102 allows the user to enrol 103 or register with the online system and set up an alias (or user profile) on a user account. As part of the enrolment process the user interface is configured to use the user device 101 to capture a current photographic image of the user. In this embodiment the user interface 102 is also configured to collect inputs 104 from the user via the user device 101. These inputs include a range of user entered user-identification data items such as name, age, date of birth, gender, address, etc by filling out fields or forms, along -with any additional information such as an alias (ie user of profile name). The user interface 1 2 is also configured to capture a current photographic image of the user, or otherwise allow the user to capture or upload a current photo of the user, such as one taken by a camera in communication with the user device 101.

Additionally the user interface 102 is configured to direct the user to provide one or more independently issued identity verification documents including at least one document with a photograph of the user, ie some form of photo ID. These independent documents may be provided by the_: user capturing (ie taking a photo) of the: document using the user device 1 1, or otherwi se providing a digital copy, or scanned copy of the document, or providing a link or access details for such a document. These independently issued documents ma include government issued photo identification documents such as a driver's licence: or a passport, or other Government issued documents such as a Medicare card. Alternatively the documents ma be issued by reputable third parties such as financial institutions or utility/service providers which bill the user.

[0056] The one or more independently issued ideality verification documents are processed to extract identi ty information which is evaluated and matched 105 against the user captured photograph and information to produce- n identity confidence score. Thi identity confidence score is evaluated or compared against some predefined criteria 106, such as havin a value with first predefined verification range: (eg exceeding some threshold value), if the identity is verified then the user is assigned a verified status and the user is enrolled and registered into the System. The completed user alias (including the verification status) is then sent 107 or stored in a client database of the system. H owever if the identity confide nce score is insuffic ient to verif the identit of the user, then further inputs may be collected from the user 104, and an- update identity confidence score calculated. [0057] As noted above the above system is implemented as a cloud hosted product allowing easy integration into any website, mobile application., cloud portal or desktop software. The collected data is uploaded and stored on the server-side, and server-side scripts perform the task of matching the user inputted intonatio against: the captured documents. High levels of security encryption can be used throughout client-side and server side scripts to ensure information collected into database i s not access ible or readabl e in times of a system breach. For a

combination of security and performance reasons the enrolment data is processed and stored in a separate database to registered users,

[0058] To ensure an carolment i not lost in the event of loss of connection to the hosted server a fail- safe method may be used to save ail incomplete enrolment data. Figure 2 is a flow chart illustrating the steps in retrieving and completing an enrolment (or registration) and Figure 3 is flow chart illustrating the exchange of data between different databases during the enrolment process, Referring to Figure 2, the user at a client. ortal (or site or app) 201 opens tiie enrolment screen 202, The system then checks whether this is a new enrolment or completion of an earlier incomplete enrolment 203. This may be determined through an input o the enrolment screen (eg ne enrolment and continue enrolment button), or fay testi ng for the existence of a cookie on the use de vice indicating an incomplete enrolment, if the user indicates or the system, detects that the user has an incomplete enrolment, then the system locates the record 204 and opens the located record 205. The user then inputs missing data 206. Alternati vely if the user is a new user then the system selects a new record 207 and allows the user to input data 209. In each ease, the system continues as outlined in Figure 1 by evaluating and matching input data 209, If the evaluation is successful and the identity⁷ of the user is verified 210 then the completed alias is sent to th client database 211 . If ho wever the evaluatio is unsuccessful and the identity cannot be verified then the user is prompted to input additional data 210. Once an enrolment process is complete the system then awaits a further enrolment or completion of a incomplete enrolment,

[0059] Turning now to Figure 3, all input data 301 is provided to an alias enrolment database 302 which stores the collected data so that an alia can be retrieved and completed at a later date i the connection is lost or the user suspends the enrolment process. To increase- the speed and. response of the facial matching system i t is desirable to reduce high quantities of data being held on the alias database 302. Thus to reduce lag and response times the alias database 302 backs up the individual records and deletes them every fifteen minutes. Once a user has input sufficien data to allow the alias to be completed and scored an alias has completed enrolment 304, the individual's data will be sent to the nominated operator database 305 and deleted from the alias database 303. [0060} Wi h reference now to Figure .1 , the data extraction and matching step 105 is performed using OCR, pattern and. text matching, facial detection, and facial matching algorithms or systems. The OCR system is configured to use captured or scanned images of .documents to recognise and translate characters fro an image to plain text. Extracted text can be parsed to extract, user identification data items which can be compared or matched with corresponding user entered data items to generate a textual match score. In the present ease, a facial detection module is used to identify^' the locations and sizes of human faces in the captured images and to ignore anything else i the image such as buildings, trees and bodies. The facial detection algorithm used may be based upon the Viola- Jones object detection frame work as will be outlined below. The facial detection module can also be used to align the face with reference marks to ensure any^■discrepancies between images are not artefacts of the image capture process. Once a face is: detected, the facial matching module analyses the face t identif and extract facial features. In one embodiment features are extracted using a Principal Components Analysis (PC A) based method to extract eigenvectors. To match the user supplied image with the image extracted from the image extracted from the identity docume t, each image is projected int face space using the eigenvectors from PC A and this information is used to determine if the faces match (or not). Race and gende estimatio is performed using a supervised feature selection method by combining of principal component analysis (PGA) and Component Analysis (1CA) algorithms. Using these features a support vector machine (SVM) classifie was built and was trained and tested on a set of 9000 images. A. SVM race classifier was: developed to classify an image into one of four races (Asian, Caucasian, African and Middle Eastern). Similarly, a. SVM gender classifier was developed to classify an image as either male or female. Age estimation, was performed similarly in which an image was mapped to one of 11 classes, each with a five year range spanning 15 to 6 years. These methods are discussed in further detail below.

[0061] In an alternative embodiment, the facial detection^' algorithm may identify approximately 80 nodal points on a human face. Some or all of these nodal points are used to measure variables of a person's face, such as the length or width of the nose, the depth of the eye sockets and the shape of the cheekbones. The facial feature or nodal map can also be used to estimate age, gender and race of the user. The nodal map generated from the user supplied, image can then fee compared and matched with the nodal ma generated from the official or government issued image to generate a facial match score,

[0062] in this embodiment, the identity confidence score is based upon a textual mapping score obtained from matching the extracted user identification data items with the corresponding user entered data items, and a facial matching score obtained b matching the extracted photographic image of the user and the capt ured c uitent photographic linage of the user. However in an alternative embodiment, the identity confidence score is -calculated based upon comparing the captured curren t photographic image of the user with the extracted photographic image, and by comparin features estimated b analysing the captured current photographic image of the user with the user data items extracted from the documents. For example identity dat items or user parameters such as age arid gender and eye colour are extracted from, the identity documents, and the captured image is analysed to estimate these data items ( or parameters or facial features). The identity confidence score may be based on the correlation between the extracted and estimated parameters. In some embodiments the identity confidence score may be also be based upo additional data, such as whether the IP address of the user device correlates wi th the physical address, and the payment confirmation from credit card held in the user's name. Weighting factors may be applied to different components used to calculate the overall identity confidence score. For example the facial matching- score may carry greate weight than the textual mapping score, o specific data items such as age or gender may be weighted more highly than othe data items such as address. Additionally extracted data items or facial images can be cross referenced with a database of current enrolees or registered users and with chosen databases such as those- listing convicts, sexual offenders or other databases that the operator see fit.

[0063 Due to the difficulties faced with evaluating a perfect textual and facial match through the use of user device 101 for capturin the images, several predefined verification ranges arc defined. ^"In some embodiments the verification ranges may be open ended ranges (eg all values, above or- below a threshold value). Figure 4 illustrates an ou tput range 40.1 of the identity confidence score according to an embodiment with three predefined ranges. In this embodiment the output range is from 0 to 1000 although in. other embodiments a different o open ended range may be used, m this embodiment tire first predefined verification range 402 is 750 and above (ie 750 to 1000) indicating successful evaluation (ie verified status) in which ease the enrolment is performed automatically in real-time and the completed alias is sent to the client database 1.07 (ref Figure 1 ). It will of course be appreciated that other suitable- output ranges may be used.

1 064] The second predefined range is from 500 to 750 indicating a marginal verification status. In this ease the user may be asked to supply additional documentation to prove their identit and/or the submitted data (and images) ma be provided to the hosted admin for manual review and processin by human administrator (ie a semi-superv ised approach). The third predefined range is below 500 (ie 0 to 500) indicating an unveri fied (or not verified) status. Such users will also be asked to supply additional documentation to prove their identity. If, after some fixed number of attempts or requests for addi tional documentation^' (eg three requests) the identity confidence score is still within the unverified, (third predefined) range then the enrolment process is stopped and the. user will be added as a black listed alias and rejected, in some embodiments the black listed user: is given the opportunity to contact the hosted administrator to resolve the matter,

[0065] Figure 5 is a flow chart of a method of preparing a captured i mage for facial matching such as by identifying; .cropping and adjusting the captured image. A current or real-time image 502 of the user is captured 501 by the user device 101. Similarly an image 506 is captured 505 ^■from, one of the independently issued identity verification documents, such as a Driver's Licence or Passport. As wi l be explained in more detail below, capturing of images may be assisted by the user interface 102 displaying guides or a container in flic displa of the user device i Oi , to assist the user in capturing a good quality image of appropriate size and orientation. For each of the images 502, 506,. the facial detection module locates the face, and the face is then cropped and the angle adjusted 503_* 507 in each of the images 502, 506 to obtain standardised facial images 504 and 508.. The cropped images are: then sent for facial matching and analysis 508: and stored in a iacial match database 509. Captured, images from the user, such current image of the user 502, can he run through a compression engine to redoce the size of file needed to be stored o the database. The original compressed image 502 of the user is stored for the indi vidual for use an: avatar in the online system.

[0066 [ The independently issued identity verification documents may be received by or provided to the system in a variety of ways. In this embodiment the user is guided to capture an image (eg photograph) of the document using a camera (such as a web-cam) o the user de vice 101, or possibl on a device in communicatio with the user device 101. However in other embodiments the user may scan the document using a deskto or flatbed scanner, and then upload the scanned document, or they may provide a hyperlink to a digital version of the document, or authorise the online system to obtain a digital copy of the document from the document issuer .

[0067] Figure 6 is a flow chart of a method for extracting user identification data from an independently issued identity verification document according to an embodiment. In the present case, the illustrated method uses an OCR system to extract identification data items from the verification documents. Preferably, the data to be extracted corresponds to or is otherwise associated with the data items supplied by the user. The System starts 60! when an image or scan of an independently issued identity verification document is captured or uploaded. The OCR system will scan or read the captured test 602 to obtain extracted dat items using pattern matching or a pixel ma based approach. Captured text may be parsed and the system may estimate or identify one or more data items based upon this parsing process. For example the parsing may recognise a string such as "Date of Birth" or "DOB⁷' indicating this data item is present, and m adjacent string having a date format, (eg dd mm yyyy} will be recognised as the extracted value of this data item, in some embodiments the system is configured with a list of search patterns corresponding to data items (eg "Name", "Address", "Date of issue", "^'Expiry Date", etc). The system matches these patterns and then searches for the value of the data, stem in an adjacent strings . In some embodiments the system is preconfigured with a range of document types and stores a pixel map for each document type that indicates where in the image- specific data items are located. When uploading a document the user can specify the document type to assist the system in extracting data from the scanned document based upon the known, pixel map.

[0068 The OCR system processes the captured document and scans for data items which corresponds with or are otherwise associated with some or all of the entered or input data items. If identity data items can be successfully extracted and matched with user entered or input identity data items 604, then the system, perform check to determine if all required data items have been extracted and matched against a correspondin user entered value 605. hi the event that all required data couldn't be matched against the user entered, or input data items, the user ma be asked to capture or provide a. second or fintber verification document 606. If the OCR system wa unable to determine data items then the OCR system may reattempt to re-read the document 601 using different scanning or OC parameters Once all the required identificatio data items have been evaluated and matched 607 the system calculate a textual matching score to allow generation of the identity confidence score.

[0069 ] To ensure that each image captured , whethe it be the current image of the use or the capture of a independentl is sued v erification doc ument , is withi focus and al igned correctly, the system use interface will present instructions and containers or guides tor the user to follow (^'the term guide and container will be used interchangeably). This ensures that the focal point of the image is a constant from: one indi vidual to another and to facilitate the use of a pixel ma for a known independentl issued verification document. Additionally the use of guides or containers ensures a degree of consistency for captured images that facil itates classifier based facial matching and analysis systems as described below.

[0070 j Figure 7 is a flow chart of a method for capturing an image of an independently issued verificatio document illustrating the steps in capturing, breaking and distributing content for OCR system and facial detection. The user interface 102 provides a con tainer that is displayed in the vie w finder of an image, and the user is requested to adjust the focus or location of the camera until the corners of the independentl issued verification document fit : the corners of the guide o container 701 , When, the corners Use up 702, an image 704 of the verification document fit is captured 703, If the document type is known (eg Passport, or Driver's Licence) the the pixel map associated with the document type can be- used to identify regions where specific data items and a image of the user may be found. If text regions can be identified in the image 705 then the ex tracted text is passed to the OCR system for recognition 708 and matching. Similarl if an image of the user is detected 706, then the image is sent to the facial detection and facial matching module 707. if text or image data (assuming it is present in the identity document) is unable to be extracted then the user ma be prompted to recapture the image 701.

(0071] Similarly Figure 8 ts a flow chart of a method for capturing a current image of a user. According to the illustrated embodiment, onee the user is ready to capture an image of their face 801 , the use interface 102 provides a set of user guides 806 (eg vertical 8 8 and horizontal 81 lines) and/or a container 814 is displayed in the view finder of an image 802, and the user is requested to adjust the focus or location of the camera until the image of the user matches the guides 806 or is located within the container 814. For example a vertical line 80S in the form of a centreline 812 may be displayed along with horizontal lines 810 for the eye nose and month locations. Once the user has lined up the image of their face to the guides, a current image is captured. If a face (or face like: object) is found in the. image it is. provided to the facia! detection and facial matching module 804, otherwise the user is prompted to recapture an image 801.

[0072] One of the independently issued identity verification documents may be a credit card. In this case the system may be configured to take an image of the credit card and extract the relevant credit card details such as the card number, expiry date, name and a security code. If the online system is a subscription based system or othen-vi.se requir a payment from the user then the extracted credit card details can be used to process a payment. Figure 9 is a flow chart of a method for capturing a image of a credit card using the OCR. system according to an

embodiment. This helps to prevent credit card fraud for individuals and ensure the name on the card can be matched to the individual enrolling in the system.

[0073] The illustrated method begins 901 with the user interface 1.02 prompting the user to pro vide a credit card, and the user interface 1 2 guides the user to capture a front and rear image of the credit c ard 902. The OC system then uses pattern matching or a pi el map. approach to read the captured credit card details 903. If all required characters are recognised 904 and the details match the user's name then the system will display the card details 905. The user may then confirm the credit card details and allow a payment to be made 907. If characters are not obtainable from, the card, then the user will be required to recapture the credit card again 908.

[0074] Figure 10. is a flow chart of a method for processing a payment from the user to a bank merchant API according to an embodiment. The credit card is OCR'ed 1001 as outlined in Figure 9 and the input credit card details are obtained and confirmed by the user 1002, The user then selects a payment 1003 and authorises payment 1004. The payment is then processed by the bank merchant 1005 and the system checks if a successful payment was performed 1006, If payment was successful then, the process ends 1009. If payment was unsuccessful then the user is asked to confirm the credit card details 1007 and if the details are correct then the payment is again processed by the bank merchant 1005, If the details are incorrect then an image of the credit card is reohtained and processed. The success (or failure) of the payment process and .matching of credit card details to the user entered details ma also be used determine the identity confidence score. The above system can. be used whenever a payment is made or a set of recurring payments are; established..

10075] Referring now to Figures 7 and 8, the facial detection algorithm 706, 803 may be based on the Viola-Jones object detection frame work. This is a computationally time efficient framework for detecting faces within an image. Once a face is identified., feature extraction, age., race and gender estimation can be performed, and the face extracted from the captured user image can be compared with the face extracted from rise identity document.

10076] Figure 1 1 illustrates a series of images showi ng th steps to identify nodal points of a. face for fecial matching. A image is captured 1101 using capture guides and is cropped to the facial region 11 2. The image is converted to grayscale and adjusted (eg balanced) to produce a. more defined image of facial features 11 3 , in the illustrated embodiment, the system identifies a. minimum of 22 nodal points 1104 of the 80 generally present on a face to generate nodal map 1 105. Howe er, it will of course be appreciated that a different number of nodal points: may be used. The nodal map 1105 is then analysed to measure facial features and estimate parameters such as the gender, age and race of the face.

[0077] Figure 12 is an illustration, showing two identified images 1201 1203 and respective results 1202 1204 from the facial matching database, hi the first image 1201 the estimated parameters 102 provide an age of 25 wit a max age of 28 and minimum age of 24. The image is identified as a Caucasian male and the eye colour and skin colour is also estimated,

[0078] As the image obtained from the independently issued identity verification document may be several years old or more (eg passports have a life of 10 years) the facial comparison or facial matching with, the current image may need to be adjusted to compensate for the age difference. This may be achieved by estimating the age of the face extracted from the independently issued identit verification document, and adding the number of years since the document was issue to tiiis age. This adjusted age can then be compared with the estimate age of the current image.

[0079] Further detail on the facial detection application programming interface- (API) will now be outlined. The facial detection API includes four components: face detection, image processing, facial feature extraction; and faciai matching of features. [0080] Face detection, is the first step in obtaining a facial match, between two images . The performance of the face recognition system is influenced, b the reliability of the face detection component. When given an image, the face detection will be able to identify and locate facial features -regardless of their position, scale, orientation, age or expression, To further enhance the reliability of the face detection component the user interface provides guides for a user to align their face within.

100811 The first process is to detect the edges of the facial, features. This includes reducing the image data significantly by using smoothing and noise reduction filters, while preserving the- structural properties to be used for further image processing so that noise in an. image is not mistaken for edges. Noise reduction can. be achieved by applying a Gaussian filter. In. this embodiment, the process of applying the Gaussian filter is realised from the kernel of a Gaussian filter with a standard deviation of σ = 1.4. in this embodiment a 5x5 Gaussian filter was applied to an. image although other size or types of filters can be used.

[0082] The algorithm to find edges defines where the grayscale intensity of the image changes the most. These areas are found by determi n ing gradients of the image. Grad ients at each pixel from the smoothed image arc realised by applying a Sobel operator (or Sobel Filter). First we approximate the gradient in the x (right) direction and y (down) direction by applying the kernels (ix and K₍ y . The magnitude of the edge strengths is then {} - ^'(7,■+ (j with a direction q - atan2(G_y,G_x) . 1» some embodiments the Manhattan distance G - G_% + G_T measure is used in place of the calculating the magnitude in order to reduce the computational complexity.

[00 3] The next part of the process is to convert the "blurred" edges in the image to "sharp" edges. In an embodiment, this part of the process is performed by preserving all local maxima in the gradient image, and deleting everything else. The algorithm applied to each pixel in the gradient image includes:

1) Round the gradient direction 0 to nearest 45°, corresponding to the use of an 8- conneeted neighbourhood,

2) Compare the edge strength of t he current pixel with the edge s trength of the pixel in. the positi ve and negative g dient direction. example, if the gradient direction is norm (theta - 90*), compare with the pixels to the north and south.

3) if the edge strength of the current pixel is largest, keep the value of the edge strength. If not, remove the value.

[0084] The edge pixels remaining after the non-maximum suppression step are marked with their strength in a pixel-squared .format Most of the edges found within the image will be true. however some may not be due to additional nois or colour variations due to rough surfaces: in the background or facial hair. To remove these we use a double threshold, so that only the edges with a stronger presence than 200 on threshold are marked strong and edges with a weake presence than 28 are removed. The edges that tall in-between the two thresholds will be marked as weak AU edges classed strong are immediately included it) the final edge image. Weak edges arc included if and only if they are connected to stron g edges. The final output will then be outputted with increased exposure.

[0085] This applies well to images thai are inputted as an avatar image where a face is the main focus of an image.. However in order for the system to find a face from the independently (eg government) issued identification we have applied a facial detection and tracking feature. The goal of this part of the facial detection component is to determi ne w hether or not there are any faces in the image arid, if present, return the image location. The processes associated with this can be attributed to many variations in scale, location, orientation, pose, facial ..expression, lighting conditions and occlusions.

100861 The procedure is based upon the Viola- Jones object detection framework. This framework calculates scalar features included of sums of image pixels within rectangular areas defined by Haar wavelet basis functions. Pour Haar wavelet features are used with each feature includin a set of white and black rectangles, with the first two Haa wavelet features including two identical size rectangles (one set arranged verticall and the other set arranged horizontally), and the next two including three rectangles (a vertical black rectangle bounded by white rectangles) and a four rectangular feature (checkerboard arrangement). Scalar features are computed in an image region by summing u the pixels from the white region in the Haar wavelet features and subtracting features in the dark region of the Haar wavelet features. By first calculating an: integral i mage these scalar features can. be rapidly calculated in. real time for the entire image as the calculations only require a few operations pe pixel. The second part includes constructing a classifier for selecting a small number of important: .features^' of the face using an Adaptive Boosting (or

AdaBoost) learning algorithm. The total number of Haar f atures in an image sub-window is much larger than the number of pixels. However the large majori ty of the possible features are unimportant and using AdaBoost, the focus is pointed on small set of important features so that at every stage, the boosting process further narrows the feature selection process. In the third part, classifiers are united in a cascade structure of increasing complexity so that each successi ve classifier is trained onl on those selected samples which pass through the preceding classifiers. This allows fast evaluation of strong classifiers. [0087] The iniegra! image at location (x, y) is the sum of all the pixels above and to the left of (x. y) inclusive: I(x,y) - Ύ. $ where l(x, y) is the summed image and i{x, y) is the original image. The integral image can be computed efficiently in a single pass over the image, using the fact that th value at (x, y) is I(x, y) - i(x, y) + I(x - 1, y ) + I (x, y - 1) - /(!¾. 1, y- - 1).

Once the integral image has been computed, the ta sk of e valuating any rectangle can he accomplished in constant time with just tour array references. That is for a rectangular with clockwise comers: A, B,€,. D with A in the top left, the integral image sum over the rectangle i(x, y) = 1(D) + 1(A) - 1(B) - 1(C) .

[0088] The second step is to apply the AdaBoost learning algorithm to determine the optimal threshold classification function that selects the single rectangle which best separates the positive and negative examples. We first define:

0 otherwise where Κ₍(χ) is a weak classifier, f_t (x) is a feature, Pi is the parity that indicates the direction of the inequality, q is a threshold and x is 1 0x100 window of the ori ginal image. The AdaBoost algorithm may be represented in the folio wing way. A training set of images (the input) is provided: T - {(X, ₅ v₍ ), , .. (¾ ,}'„) where y, is 1 for positive examples and 0 for negative examples. Initial weights are defined as w ^a) - l/2k for the positi ve examples (¾ - 1 ) and w * - l/2 J for the negative examples (y, =0), where k+p^~n. Then for r^~L. T the weights are normalized if'.'' - w j ^ w and a cl^sifief^ is framed for c di featarc , and is restricted to use only single feature. According to w_r the error is e_f - T ff_£ j t,(x_:- ) - y_t. j . The classifier wit the lowest errors is chosen and the weights are updated:

Where Q =·0 if example is classified correctly, otnerwise _: = 1 , an J¾_:. = -≤? ) . The output is the final (strong) classifier:

where a_y = log(l/ .£>.) ,

[0089] The next step of the facial detection component is to discriminate input signals into several classes, considering different lightning conditions, pose, expression and hair. However, the input signals are not 100% random and even more, there are patterns presented in each input signal. When observing the inputs common features like: eyes, month, and nose. ith the relative distances between these features we uncover the eigenfaces from the principal components. They can be extracted out of the original image data through Principal Component Analysts fPCA).

[0090] Given a face image /fx, v), it can be considered as a two-dimensional AW array of intensity values. Also, an image ca be normalised to a N vector. Therefore a 256 x 256 image is a vector of dimension 65,536 or alternatively a 6.5,536-dimensiona] space. Due to similarities of faces, a set of face images will map to a low-dimensional subspace of his huge possible space. Using PC A, the vectors which have the largest corresponding eigenvalue are found. Next we compose a subspace within the entire image space with the eigenvectors of the eovariaaee matri correspondin to the eigenfaces.

[0091 If the training set of face images is G, , G₂ ,, . , , G_:¾r then the average set is defined by y G,. ~y . This vector

set is the subject of PC A which: finds a set of M. orthogonal vectors v , which shows the distribution of face images within the image space. The k-t vector, v, is chosen so that:

is a maximum subject to:

[0092] The vectors v_k and the sealars l_k are the eigenvectors and eigenvalues, respective of the co'variance matrix:

where A = [7,J₇ ... j ] . Therefore the covariance matrix C is N^'xN" and there should he N² eigenvectors an eigenvalues. Consider,

x, of A¹ A. such at Α' Αχ_}

Multiply both side by A results in A A¹ Ax] = /?¾..4x,.. Mow x, are the eigenvectors of covariance matrix = . Following this analysis we construct MxM matrix /, = A? A where

L_mi - j^' j^",, and X_t - M are eigenvectors of matrix L, These vectors determine linear combina tions of the M training set. face images to form the eigenfaces 13, where

M

n,

----1

At this point the calculations are reduced from the number of pixel in images _V to the number of images in the training set M (and in practice M « N).

[0093 J To obtain a facial match, between the current image of the user and the image obtained from the independentl issued identity verification document, a small number of eigenfaces will be sufficient fo matching the images. This can be performed by classifying using only vectors with the largest associated eigenvalues. The process of classification, of an independently issued image G₍,_ei„ to the current image proceeds in two steps. Let be M' the significant eigenvectors of the T matrix. The first step is to transform the independently issued image into its eigenface components and then to project the independentl issued image into "face space". The resulting weights form the weight vector ^W' ^^J"^{sw '}) v^theⁱe ϊ ^ ^ _m^ "^M J describes the contribution of each eigenface in representing the input image. Now we determine which face class provides the best description of an input face image fey finding the class that minimises the Euclidian, distance. The Euclidian distance between a new face image, Q and a

Mi face class is J ,. = ^|w - W ^₍.,| [0094] A face is classified as belonging to face class k if the minimum^ is below a chosen threshold {⁾. Only four possible outeoraes are equated for the independently issued imag and it's eigenvector. These are:

1) Near face space and near a face class - the subject is recognized and identified;

2) Near face space bur not near a known face class - the subject is an on known individual;

3) Distant from face space and neat' a face class - it is »ot a face image;, or

4) Distant from face space and distant from face class - it is not a face image.

[0095] Using its features vector and the eigeniaees obtained from. the user set of images an image is approximately rebuilt:

G«= Y + F when

is the projected image.

1009 1 From equation 8 we can consider that the input face is rebuilt by adding each eigenfe.ee with a contribution ¾-y in equation .9, t the average in the current user set of images. The rebuild error can be realised by means of the Euclidian distance between the original and the

reconstructed face image as given in equation 10:

rebuild error ratio—

[0097] The rebuild error ratio increases when the current user set of images and the

independently issued set of images differ heavily from each other, This is due to the accumulation of the- average face image. When the members differ from, each other the average face image becomes messier and this increases the rebuild error ratio. Generally, the eigenface method of recognition requires multiple images to make the set of images, in this- embodiment the system just uses two - the first bein the current user image and second is the independently issued image (ie government ID image). In order to increase efficiency and reduce the computational task, the user interface provides on screen guides to guide the user when taking the image so that images presented are as similar as possible. Whilst there are four possibilities discussed above- for an input imag and its pattern vec tor, in this ease we only require processing of two as die third and fourth possibility only occur when a face image is not identified, i.e. an image of a hand, rather than a face. That is the two options a (1), near face space and near' a. face class, or (2) near race space but not near a known; face class, in the first case (!}, an individual is recognised and identified. In the second ease (2), a unknown individual is presente and the image sets do not match. In this case the face set is saved to database and further photo identification is requested from the user,

10098) We now turn to race and gender estimation. Feature generation and selection are a important consideration in race classification as poorly selected features, such as features of gender, identity , glasses and so on, may dimmish the efficienc of the performance, in an embodiment of the system, a supervised feature selection, method is applied by combining of principal component analysis (PCA) and Component Analysis (TCA) algorithms.

[0099] In an embodiment PCA is used for feature generation. Using two sets of training samples: A and B. The number of training samples in each set is N. Φ, represents each eigenvector produced by PCA. Each of the training samples, including positive samples and negative samples, can be projected into an axis; extended, by the corresponding eigenvector. By analysing t e distribution of the proj ected IN points, we can ro ughly select the eigenvectors, which have more race irrformation. The following is a detailed descriptio of the process:

1.) For a certain, eigenvector Φ,, compute its mapping result according to the two sets of training samples. Th.e result ean. be described as ;_;ι„ (I <i <-M, 1 j 2N).

2) Train a classifier f_t using a simple method such as Perception, or Neural Network which can separate into four groups: Asian, Caucasian, African and Middle Eastern with a minimum error / ^'( . ).

3) If E^'(f_t) <g, then, we delete this eigenvector from the original set of eigenvectors.

[00100] M is die number of eigenvectors, 2-N is the total -number of training samples and 0 is the predefined threshold. The left few eigenvectors, are selected. The eigenvectors can also represent back to face: images, and are referred to as eigenfae.es.

[0 101] In some events, too few useful or nil eigenvectors are found in the single PCA process. In this event we- propose the following approach to identify and solve: this problem. Assume that the number of training samples 2M is large enough. The system will randoml select training samples from the two sets. The number of selected training samples in each set is les than M'^:2, Then, we perform the supervised PCA analysis with them. By repeatin the previous process, we can. collect, a number of good features. The main idea of this approach is that it may emphasize some good features b reassembling data and then make the features stand out easily.

[00102] Tire system preferably maps the eigenvectors produced by PCA to another space using ICA, where the mapped values are statistically independent. In this way, the system can provide the classifiers a clearer description of features for race information. The basic i dea of the ICA is to take a set of observations: and to find a group of independent components that explain the data. PCA. considers the 2nd order monrenis. only and it un-eorrelates data while ICA accounts for higher order statistics and thus provides a more powerful dat expression than PCA.

I GO 1031 Figure .13 presents a comparison between PCA and ICA, used for feature extraction for race classification task. In tile chart, X-axis stands for the hides of a certain feature and Y-axis stands for the corresponding discriramabiiity evaluation performance- For example, a performed ICA for 33 good features produced by PCA, There are 33 ICA features produced. They arc statistically independent. The distinguishing capabilities of both PC A features and ICA features are computed and displayed i the chart below. From the chart, the system can see that even though several ICA features wit lower distinguishing rate are generated, ICA features with higher distinguishing ra e are also produced.. Then, several ICA features with higher

distinguishing abilit are selected as the features used in classification. By this way, we can further reduce the number of good features, and collect better features.

10 104] Preferably, embodiments of the system separate facial images into four classes:

Asian, Caucasian,. African, and Middle Eas tern, according to a group of features. Using the classifier: method of Support Vector Machines (SVM) the system may be extended to regression estimati on solutions of the type :

given by its measurements _f with noise at some (usually random) vector x_f

[001 5 j In SVM, the basic idea is to map the data X into a high-dimensional feature space

/via a nonlinear mapping Φ, and to do linear regression in this space.

= F (xJ) +b with F : fi F, w , F, where b is a threshold. Therefore., linear regression in a high dimensional spac corresponds to nonlinear regression in. the low dimensional input space ¾*. ' ote that the dot product in between.ø and F (3c) would have to be computed in this high dimensional space, if we are not able to use the kernel that eventually leaves us with dot products that can be iMplicitly expressed in the low dimensional input space R" Since Φ is fixed, we determine to from the data by minimizing the sura of the empirical risk R_e!!lp[f] and eornplsxit terra W , which enforces flatness in feature space

where / denotes the sample size (.¾ , ..., A^*, ), C(.) is a loss taction and ^ is a reguiarization constant. For a large set of loss function, can be minimized, by solving a quadratic programming: problem, which is uniquel solvable,: !t can. he shown that the vector & can he written in terms of the data points

with a_p a_f being the solution of the quadratic programming problem. a_f> a, have an intuitive interpretation as forces pushing and pullin the estimate /(A^*- ) towards the -measurements y\.

_ =∑' (a, - a^* )f F (xt) F ix ⁾) - b ∑ .(a, ^~ a* )K(x^. _h x) + k

where _¾,. -≤f. are Lagrangian multipliers, and ¾ are support vectors.

1001 6] in the above formula ( 1 ), we introduce a kernel function

K {Xt , Xf ) = P ( X; ) F (Xj. }.. As explained below, any symmetric kernel function K satisfying the condition correspoods to a dot product in some feature space. There are many kernels that satisfy the condition as described in below. In one embodiment of the system, we take a simple polynomial kernel in the above formula: where d is user defined.

100107] After the off-line training process, we obtain the values for Lagrangian multipliers and support vectors of SVIvL Let x, =.[x_t?X_2f · ··, X^] ^' (x_t is an element of x and x, a sample data of x . By expanding (4) according to (5), we know that (x) is nonho ogeneous form of degree din x „ R"

/(¾ = ∑

fOOl S] After the PC A and ICA feature .extraction and the selection process, we have obtained some sets of good feature groups. With different combination among these feature groups, we can obtain a large number- K of SV classifiers by learning from, a set of training data. Each of the SVM classifiers has different classification rate and different features inside. Since the number of inputs fo a SVM mode! cannot be very large, any i dividual SVM classifier has limi ted discriminating information,. For this issue, we created an algorithm to addiess this as follows:^'

wrisr oat the classification results ¾- for all of the available SVM classifiers in all the labeled sample facial images into the set {], -1} ; and

comb ne all three classifiers to form a ne classifier fay adding the three SVM classifiers results and decide the final result for the classifier^' according to the sign of the result for each labeled sample image:

where R_i . R₂ and .·¾, are result for a sample face image of three different classifiers and R_H is classified result for that face image for a ne classifier, which is produced by fusion of the three different classifiers. [00109] This method will touch all the combinations of three classifiers in the current l evel, which can be qu ite time consumi ng. Thus to reduce the time consumption of the seleetioii process_., we define a suitable threshold p b compromise between avoiding the time computation and producing good fusion classifiers with better performance. Figure 1 is a flow chart of a cascaded SVM classifier structure according to an -embodiment.

[00] 10J To implement one embodiment of the- system, we collected images of human feces from different racial backgrounds. The faces, were detected automatically by the frontal face detection system, Then, we manually labeled the results using a large database, which contains more than 9000 face images of different race, gender and age. In the present case, each of the face images were resized, into a standard image with 24 s 24 pixels. We divide all the images in the database into two folds, each fold co taining Asian, Caucasian, African and Middle Eastern face images. The first fold of face database was used for feature extraction, SVM .classifier training, evaluation, and testing of race classifiers in the first step, while the second fold of face database was used for training and testing of the fusion classifiers in the second step.

[001 II] Gender selection was performed in a similar maimer. I this case tile process is as described abcWe with the change that the classifier f is trained to separate ¾- into two groups:

Male and Female with a minimum error /i(^). As before the 9000 face image database was ^'used which includes 4500 male- and 4500 Female images of different races and ages. Again all of the face images are resized into the standard image with 24 x 24 pixels. We divide all the images in the database into two folds, each fold, containing Male and Female face images. The first fold of face database was used for feature: extraction, SVM classifier training, evaluation and testin of gender classifiers in the first step, while the second fold e face database was used for training and testing of the fu sion cl assifiers in the second s tep..

[0011.2] Age group estimatio was similar, but. sought to reduce the computational task by using classifiers already in place from other components, along with the ability of our new method being able to complete a higher succes rate in less time. Working in similar^' ways to the gender and race estimatio components, the age group estimation projects a new face image into a face space, comparing its position in the face space with a. database of pre-set face images. The age group method preserves^' the identity of the subject, while enforcing a realistic recognition effects on subjects. This was achieved by defining 1 1 age grou classes each with a constants year age range to allows age estimation between 15 and 60 years. However in other embodiments, the age range could be extended, or the number of -classes or range of an age class varied

(increased o decreased), or non itniform age ranges conld be used for the classes, for example older age classe (55-65) may be wider than mid-range classes (30-35). [00113] Features extraction deals with extractin features that are basic for differentiating one class of object from another. First, the fast and accurate facial features extraction, algorithm, is developed. The training positions of the specific face region are applied. Figure 15 illustrates extracti n of features from a. face in a database and subsequent representation in a column matrix A according to an embodiment.:

[ 00.1 14] The face space is computed from the Euc lidean distance of feature points of two faces. The fundamental matrix A is constructed by the difference face space among the input and each face. Then, rive average face features of the thirteen, age groups matrix W can be formed.

[001 15] We calculate the Covartanoe Matrix Gov- W ' and then build Matrix L= WW ¹ to reduce dimension. We then find the eigen vector of Co v as the ^"Eigenvector represents the variation in faces. Finally, age is determined through the minimize face space.

[001 16] The PCA. can then do prediction, redundancy removal, feature extraction and data compression. Now let us consider the PCA procedure as in previous, components the training set of M face images. Let a face image be represented as a two dimensional N x N array of intensity values, or a vector of dimension .N*^'. The PCA then finds a . -dimensionai subspace whose basis vectors correspond to the maximum variance direction in the original image space. This new subspace is normally of a lower dimension (M« N^A). Mew basis vectors define subspace of face images called face space. All images of known faces are projected onto the face space to find sets of weights that describe the contribution of each vector. By comparing a set of weights for the avatar face to sets of weights of the database face images, the face can be identified. PCA basis vectors are defined as eigenvectors of the scatter matrix. S defined as ,'

S - T C» - ).(xl - m) '

where m is the mean of all images in the training set an xi is the i^!k face image represented as a sector i. The eigenvector associated with the largest eigenvalue is one that reflects the greatest variance in the image, That is, the smallest eigenvalue is associa ted with the eigenvector that finds the least variance, A facial image can be projected onto M '_ « M) dimensions by eompirting

[001 1 7] The eigenfaces can be viewed as images. The face space forms a cluster in an image space and PCA gives suitable representation. Using a Diagonal PCA that is developed from PCA approach. Diagonal PCA can. be subdivided into two components - PCA subspaee {raining and PCA projection. During the PCA subspace training, rows of the pixels of an NlxN2 image are concatenated into a one-dimensiorial ' image vector' and only a subset of the eigenfaces (k - 1,..., M^r) is retained t farm a transformation matrix, then used in the PCA projection stage. Only the principal eigenfaces accounting for the- most^' significant variation are used in the construction. A new face image vector is multiplied by the transformation m trix and projected to a point in a high dimensional Diagonal PCA sitbspace. The projected image is then saved as the face template of the corresponding user for future matchi ng.

[001 181 Using a non-parametric technique, the Nearest Neighbor classification (NNC) asymptotic or infinite sample size error is less than twice of the Bayes error. NNC gi ves a tradeoff between- the distributions of the training data wi th a priori probabilit of the classes in vo lved. th nearest nei hbor classifier (j NN) handles an image distortions (e.g. rotation, lighting, noise). So this system produces results: by combining DiaPCA and KNN. The Euclidian distance then .determines- whether the input face is near a known face. The automatic face recognition is a composite task that involves detection and location of faces in a cluttered background, normalization, recognition and verification,

[00119] The face database contains the .1 1 individual groups. Within the database, all weight vectors of the persons within the same age group are averaged together. A range of an age estimation result is 15 to 60 years old, and divided into 1 1 classes with 5 years old range. The age prediction of the input individual is performed firstly. Then the matched individual is examined from the corresponding age group in face database based on the diagonal PC Method. Finally, the record of the matched person is extracted.

[00120 j The Optical Character Recognition (OCR) system recognizes characters such as name, date of birth, expiry date of license, license number and place of issue. Applying a matching method for recognition and data entry of a driver license or other photo identification issued b independent authorities such as government authorities, was performed by classifying the comiectivity, shape and strokes of letters. Figure 16 A is a flowchart of the OCR process applied to a captured image according to art embodiment. This process includes capturin data. pre-processing, foregroimd extraction, segmentation,, feature extraction, rendering to text character, recognition using a database of characters (A-z. 0-9) from which data is the entered.

[00121 ] The recognition algorithm is developed using a Gaussian elimination. This includes:

applying angle adjustments, resizing and nonnaiization within pre-processing;

foreground extraction to extract the characters in the document (eg license); and the connections of pixels are realised and then converted to the printed characters A to Z and digits 0 to 9.

[0 122] An embodiment of the system first pre-processes the captured images for evaluation. Each capture applies an. aftlne rotation; i* = A- cos a + y sin a to ensure the edges are aligned square, Figure 16 B illustrates the original captured image of a driver's iieenee, and Figure 16C illustrates, the aligned (rotated) image. The captured image is. then resized to reduce the processing time an then a noise filte is applied. Next, we extract the foreground blemishes by emoving noise and interfering strokes. Once -this is reduced from the image a threshold filter is applied as such from the facial detection component.

[0 123] To extract the personal data from an independently issued (eg gowmraeat) identification document segmentation is performed by the mean positions of training sets. Figure 16D below shows the registered regions of characters sets. Mean position of each word or inputs can be derived from the following relation;

Total of register position

Mean registered position =

Nitmher of test images

1 0124 J Once, data segmentation is realised relevant words are extracted. The background is removed from the capture and the foreground remains, using colour range segmentation. Figure 16D illustrates segmentation of the aligned image of Figure 16C and Figure 16E illustrates background removal of the segmented image of Figure 16D, In one embodiment this includes: if the RG B pixel value i s less than or equal 100 then eac colour compone n t value is set to ;

else the RGB pixel values are greater than 10 and so each colour component value is se to 255;

next apply a grey gradient to extract, the black pixels. [00125] The characters arc then extracted with local pixel connection. Then the characters are segmented using this recursive algorithm.

- ..

v

[00126] Connection, for each, pixel is examined in its eigh t local regions . To find known character from the database, pattern matching is required. Corresponding regions are then realised b using Gaussian elrinitiation. Let / and J be two iniages, containing m features - 1.../») and H features Jj(J = !„.») , respectively..

Where = - J₍J is the Euclidean distance. Then, decompose the matrix into the multiple of matrices 2 U and diagonal matri D : [TDU¹ ] - svd(G) , Diagonal elements D_fl are sorted into descending numerical order. Then, new matrix E can be obtained by replacing ever Diagonal element i with i to obtain P - TEL/ ¹

[00127] in the event of bad points that may cause equally good matching possibilities / the space of parin Λ' and the proximity with ( , we recognise W * W , centred on / and Jj as multiple W * W arrays of pixel intensities A and B , the normalized correlation is defined as

where A(B) are the average and s (A)(s(B}) and the standard delation of elements J{B)C_it varies from -1 for u icorretated patches to 1 for identical patches. Including this correlation information into the proximity matrix is to trans form the elements ( as follows;

[00128] The OCR system is used to extract data from government: (or other) identification documents as well as from the user's credi t card , The credit card paymen t processor realises the credit card numbers, credit card expiry and the name on. the credit card to pull information from a capture of the client's credit card. The metho applied is exactly the same as the OCR component, which re-processes, classifies and stores the pixel-mapped data into specific database tor processing. The basic process of the payment system Is to obtain the credit card details from the card capture and encrypt within the profile database. Based on the users payment selection the monies are processed through the merchants chosen financial institution, suc as PayPal, Eway, custom local bank API, etc.

[00129 ] The operator can locate, export and view al l alias records wi th the use of the adrrhnistration portal . An operator may also use the portal to brand and style the enrolment forms. The administration, portal can also be used by the operator to add packages,_, memberships, recurring charges, once-off invoices, etc. These options will be offered to the user and define the payment type that the individual will pay at the end of enrolment for access to the operator's services. The operator may select their chosen payment merchant through the administration portal; this will define the link between the operator's enrolment sales and from the. individuals and the operator's chosen bank,

|00130] To summarise, the user is asked to enter identity data including a current photograph, and to provide documents including a photograph which can be used to verify that data. Additionally the above system can be implemented as a cloud hosted product allowing easy integration into any website, mobile application, eloud portal or desktop software. The collected data, is uploaded and stored on the server-side, and server-side scripts will perform the task of matching the user inputted information agains t the captured documents. High levels of securi ty enewption can be used throughout client-side and server side scripts to ensure information collected into database is not accessible or readable in times of a system breach.

[00131] The verification system, may be modified and varied from the above embodiment

For example an embodiment of the system may verify the identity of th e user on mul tiple on line systems of communities. In this embodiment the user could provide information on all the aliases and user profiles used by the individual and the above system method could be used to generate an identity verification status. This verification status can then be provided to each of the online systems used by the user in the form of a verification token, digital certificate or watermark which can be displayed or associated with the user's alias or profile in each of these online systems. For example the watermark applied to an image could be a red/amber gree traffic light symbol or cross, question mark, or a tick symbol ^"These could be applied to in a corner of the image or to the base of the image,

[00132] The methods and system, described herein allow verification of the identity of a user in an online system, Λ current image can.be captured along with (optioBa^'l) user entered identity data, and this can be compared with information and an image ex tracted from

independent or official identity documentation allowing the identity of the user to be verified. An identity confidence score can be calculated based upon the comparison, and a verification status generated that can the be displayed to other users. The use of verification, status (eg a digital watermark to the user's avatar) allows verification without requiring the user to reveal their underlying identity information (although they can elect to do so). The methods and systems may be used to register or verit users in a variety of online environments including online dating sites, oo social media sites, or other online communities. In particular the capture of a current photograph of the user at the time of registration, and comparison with an independent or official identity documents assists in preventing or reducing fraud in online systems.

[00133 ] Those of skill in the art would understand that information and signals may be represented using any of a variety of technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips; may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic: fields or particles, optical fields or particles, or any combination thereof.

[ 001341 Those: of skill in the art w ould fitrther appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software or instructions, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrati e components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends: upon the particular application and design

constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

[0 135] The steps of a method or algorithm described in connection with the

embodiments disclosed herein ma be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. For a hardware impleineiitatjoii, processing may be implemented within one or more application specific integrated, circuits (ASICs),: digital signal processors (DSPs), digital signal processing devices (DSPBs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors,, controllers, miero-coiitroilers, microprocessors, other electronic units designed to perform the functions described herein, or a combination thereof. Software modules, also known as computer programs, computer codes, or iostrudioHs, may contain a number, a number of source code or ject' code segments or instructions, and may reside in any computer readable medium such as a RAM memory, flash memory, ROM memory, EPROM memory, registers, hard disk, a removable disk, a CD-ROM, a DVD-ROM, a Blu-ray disc, or any other form of computer readable medium. In some aspects the computer-readable media may include non-transitory computer-readable media (e.g., tangible media), in addition, for other aspects .computer-readable media may include transitory computer- readable media (e.g., a signal). Combinations of the above should also be included within the scope of computer-readable media, in another aspect, the computer readable medium may be integral to the processor. The processor and the computer readable medium may reside in a ASIC or related device. The software codes may be stored in a memory unit and the processor may be configured to execute them. The memor unit may be implemented within the processor or external to the processor., in. which: case it can be communicatively coupled to the processor via various means as is known in the art.

[0013(5] Further, it should be appreciated that modules and/or other appropriate means for performing the methods and techniques described herein can be downloaded and/or otherwise obtained by comp ting device. For example, such a device can. be coupled to a server to facilitate the transfer of means for performing the methods described herein. Alternatively, various methods described herein can be provided via storage means (e.g., RAM, ROM, a physical storage medium such, as a compact disc (CD) or flopp disk, etc.), such that a computing device can. obtain the various methods upon coupling or providing the storag mea s to the device. Moreover, any other suitable technique for providing the methods and techniques described herein t a device can be utilized.

I Q0137] in one form the invention may include a computer program product for performing the method or operations presented herein. For example, such computer program product may include a computer (or processor) readable medium having instructions stored

(and/or encoded) thereon, the instructions being executable by one or more processors t perform the operations described herein. For certain aspects, the computer program product may include packaging material.

[00138] The methods disclosed herein include one or more steps or actions for achieving th described method. The method steps and or actions ma be interchanged with one another without departing from the scope of the claims. In other words, unless a specific order of steps or actions is specified, the: order and/or use of specific steps and/or actions may be modified without departing from the scope of the claims,

[00139] As used herei n, t he term "^'determiiring" encompasses a wide variety of acti ons.

For example, "determining" may inclu de calc ulating, computing, processing, deriving, invesligating, looking u (c.g.. .looking up i a table, a database or another data structure}, ascertaining and the like. Also, "determining" may include, recei ving (e.g.., receiving information), accessing (c.g,, accessin data in a memory) and the like. Also, "determining" ma include resolving, selecting, choosing, establishing and the like.

[00140 J Aspects of the system are computer implemented. A. suitable computing systemncl des a display device, a processor arid a memory and an input device. The memor may include instructions to cause the processor to execute a: method described herein. The processor memory and display device may be included in a standard computing device, such as a desktop computer, a portable computing device such as a la top computer or tablet, or they may be included in a customised device or system. The computing device may he a unitary computing or programmable device, or a. distributed device including several components: operatively (or functionally) connected via wired o wireless connections. An embodiment of a. computing apparatus 1700 is illustrated in Figure 1 7 and includes a central processing unit (CPU) 1.710, a memory 1720, a display apparatus 1730, and may include an input device 1740 such as keyboard, mouse, touch screen, etc. The CPU 1710 includes an Input Output Interface 1712, an Arittanetic and Logic Unit (ALU) 1714 and a Control Unit and Program Counter element ! 716^' which is in communication with input and output devices (eg input device 1 40 and display apparatus 1730) through the Input/Out t Interface. he Input/Output Interface may include a network interface and/or communications module for communicating with an equivalent communications module in another device using a predefined communications protocol (e.g. Bluetooth, Zigbee, IEEE: 802.15, IEEE 802.1 1 , TCP/IP, TJDP, etc). A graphical processing unit (GPU) may also be included. The display apparatus may include^' a flat screen display (eg LCD, LED, plasma, touch screen, etc), a projector, CRT, etc. The computing device may include a single CPU (core) or multiple CPU' s (multiple core), or multiple processors., The computing device may use a parallel processor, a vector processor,, or be a distributed computing device. The memory is operatively coupled to the processor^) and may include RAM and RO components, and may be provided within or external to the device. The memory may be used to store the operating system and additional software modules: or instructions. The procsssor(s) may be configured to toad and executed the software modules or instructions stored in the memory.

[00141 ] Throughout the specification and the claims that follow, unless the context requires otherwise,; tire words "include" and "include" and variations such as "including" and "Including" will be understood to imply the inclusion of a stated integer or group of integers, but not the exclusio of any other integer or group of integers,

[00142] The reference to any prior art i this specification is not, and should not be taken as, an acknowledgement of any form of suggestion, that such prior art forms part of the common general knowledge.

[00143] & will be appreciated by those skilled in the art that the invention is not restricted in its use to the particular application described. Neither is the present invention restricted in its preferred embodiment with, regard to the particular elements and/or features described o depicted herein, it will be appreciated that the invention is not limited to the embodiment or embodiments disclosed, but is capable of numerous rearrangements _v m difications and substitutions without departing from the scope of the invention as set forth and defined by the following claims.

[00144] Please note that the following claims arc provisional claims only , and are provided as examples of possible claims, and are not intended to limit the scope of what may be claimed in. any future patent applications based on the present application,: integers may be added to or omitted from the example claims at a later date so as to further define or re-define the invention.

Claims

1. A me thod of verify ing the ident ity of a user in a» online system, the me thod including:

capturing- a current photographic^' im ge of a user via a- user interface provided on: a user device; recei ving, via the user interface, photographic image or digital copy of at least one

independently issued identity verification document, wherein each of the at least one independently issued identity verification document includes- user identification data items and at least one of the at least one independently issued identity verification document includes a photographic image of the user; extracting a plurality of user identification data items and a photographic image of the user from the at least one independently issued identity verification, document;

calculating an identity confidence score, wherein the calculation is based upon matching the extracted plurali y of user identification data items and photographic image of the user with the captured photographic image of the user; and

generating a verification status for the user based upon the calculated identity confidence score wherein the user is assigned a verified status if the calculated identity confidence- score is within, a first predetermined ^'verification range

2. The method as claimed in claim 1 , wherein the user interface is- configured to guide the user to capture the current photographic- image within a predefined regio of the display.

3. The method as claimed in claim 1 or 2, further including entering, via a user interface, a pluralit of user identification data items- by a user, and wherein calculation of the identi ty confidence score is based upon matching the extracted plurality of user identification data items with the user entered plurality- of user identificatio data items, and matching the extracted photographic image of the user and the recei ved current, photographic image of the user.

4. The method as claimed .in an one of claims 1 to 3, wherein, the least one independently issued identity verification doenment includes at least one Government issued identity verificatio document including a photographic image of the user.

5. The: method as claimed in any one of claims 1 to 4, wherein, the user interface is configured to receive the at. least one independently issued identity verification, document by capturing at least one image of the at least one independently issued identity verification document, and -extracting a plurality of user identification data i tems is performed by performing optical character recognition on the at least one image of the at least one independently issued identity verification document

6. The: method as claimed in claim 4, wherein the user interface is configured to receive the documeiit type of the at least one inde endently issued identity verification document to be captured, and the user is guided to capture the document within a container region of the captured image:, and the extraction step is performed using a predefined pixel ma for the received document step wherein the predefined pise! map defines pixel regioijs where individual identity data items and the photographic image of the user are located within the image.

7. The method as claimed in claim 4 or 5, wherein the least one independently issued identity verification document incl udes a credit card issued to the user, and extracting a plurality of user

identification, data items includes extractin credit card billing information from the at feast one image of the eredit card and providing the extracted eredit card billing information to a billing system for the online system.

8. The method as el aimed in an one of claims 1 to 7, wherein if the calculated identit confidence score is within, a second predefined verification range, the user interface is configured to prompt the user for additional independently issued, identity verification, documents, and the steps of extracting, calculating and verifying are re-performed incorporating the additional independently issued identity verifieati on documents .

9. The method as claimed in any one of claims 1 to 8, wherein a facial detection system detects the location of a face within an image.

10. The method as claimed in claim 9, wherein the facial detection system i based upon the Viola- Jones object detection frame work.

1 1 . The method as claimed in any one of claims 1 to 10, wherein nmtc ing. the extracted

photographic image of the user and the captured current photographic image of the user includes

detecting the location and size of a face in each photographic imag and performing facial recognition on eac detected lace to estimate a plurality- of fecial features for each face, and a matc score is obtained based on the: correlation between the facial features in. each face.

1.2. The: method as claimed in claim 11, wherein performing facial, recognition is based on performing, a principal. component analysis . PGA) to extract, fecial features, and the features are then classified to detennine if the face extracted from the photographic image of the user and face extracted from the received current photographic image of the use belong to the same class.

13. The: method as claimed in claim 1.2, wherein the user identification data items, includes age and gender, and a classifier estimates the age and gender of the user for comparison with the age and gender entered by the user.

14. The method as claimed in claim 1 1 , wherein performing facial recognition includes generating a nodal map of the face and comparing the nodal maps for each face.

15. The method as claimed in claim 14, wherein: the user identification data items includes age and gender, and the nodal map is processed to estimate the age and gender of the user which, is compared wit the age and gender entered by the user.

1 . The method as claimed in any one of claims 2 to 15, wherei the user identificatio data items includes a user address, and calculating an. identity confidence score further- includes obtaining art IP address of the user and estimating an approximate location based upon the IP address and comparing with an address entered by the user.

17. The method a claimed in any one of claims 1 to 16, wherein generating a verification status includes generating a digital identity verification watermark for indicating to other users of the online system the veri fication Statu s; of the user .

18. A. registration system for an online system wherein the registration, system uses the method of any one of claims 1 to 17 to verifying the identity of a new user.

19. A computer readable medium including instructions- for causing a computer to perform the method of any one of claims 1 to 17.

20. A user device configured for use i system for verifying the identity of a user in an onli e system, the device including a camera, a communications interface, a memory and a processor, wherei the processor is configured to:

provide a user interface to capture a current photographic image of a user;

receive, via the user interface, a photographic image or digital copy of at: least: one independentl issued identity verification document, wherein each of the .at least one independently issued identity' verification document includes user identification data items and at least one of the at least one independently issued identity verification document includes a photographic image of the user;

send the current photograph ic image of the user and the photographic- image or d igital copy of at least one independently issued identity verification document to an identity verification server via the communications interface, wherein the identity verification server is configure to: extract a plurality of user identification data items and a photographic image of the user from the at. least one independently issued identity verification document,

calculate an identity confidence score, wherein the calculation is based upon matching the extracted plurality of user identification data items and photograp hic image o f the user with the captured photographic image of the user;

generate a. verification status for the user based upon the calculated identity confidence score wherein the user is assigned a verified status if the calculated identity confidence score is within a first predetermined verification range.

21 , An identity verification server for an online system including a communications interface, memory and a processor, wherein the processor is configured to:

recei ve via the communications interface, a current photographic image of a user and a photographic image or digital copy of at least one independently issued identity verification document from a user device;

extrac a plurality of user identification data items and a photographic- image of the user from the at least one independently issued identity verification document;

calculate an identity confidence score, wherein the calculation is based upon matching the extracted plurality of user identification data items and photographic image of the user with the captured photographic image of the user; and

generate a verification status for the user based upon the calculated identity confidence score wherein the user is assigned a verified status if the calculated identity confidence score is withi a first predetermined verification range,

22. A method of registering a user for ait online membership system, the method including;

activating an imaging de vice during a user registration process to obtain a first: image file including information encoding an image ^'including facial features of the user;

receiving a second image file including information encoding an image of at least one independently issued identity verification document;

processing the first and second image files to determine a verification status for the user according to a correlation between at least one attribute of the first image and at least one associated attribute of the second image; and

assigning the verification status to a registration account for the user.