CN112819486B

CN112819486B - Method and system for identity certification

Info

Publication number: CN112819486B
Application number: CN202110162884.0A
Authority: CN
Inventors: 王海; 李若愚
Original assignee: Alipay Labs Singapore Pte Ltd
Current assignee: Alipay Labs Singapore Pte Ltd
Priority date: 2020-02-20
Filing date: 2021-02-05
Publication date: 2021-12-21
Anticipated expiration: 2041-02-05
Also published as: SG10202001528TA; CN112819486A

Abstract

Disclosed herein are computer-implemented methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing methods of identity attestation. One of the methods comprises: a spatial segmentation model associated with historical data of identities is trained and data corresponding to new identities is received. The data includes attributes of the identity, values of each attribute, and timestamps associated with the behavior of the identity. The method further comprises the following steps: calculating the number of occurrences of the value of the attribute between the first timestamp and the second timestamp; based on the number of occurrences, segmenting the new identity from the historical data of identities using a spatial segmentation model; and proving authenticity of the identity based on the number of splits.

Description

Method and system for identity certification

Technical Field

This document relates broadly, but not exclusively, to methods and systems for identity attestation.

Background

In conducting business, financial institutions need to meet regulatory requirements, such as anti-money laundering (AML) regulations. Particularly, financial-related services carried out by electronic wallets are strictly regulated globally. To avoid financial risks of money laundering, fraud, and the like, regulatory agencies in many regions may require that natural persons provide government-approved Identification Document (ID) pictures when opening electronic wallet accounts to prove their identity.

Currently, user information and pictures such as ID cards are mainly collected through online channels such as web pages and mobile phone applications. For the operator of the electronic wallet, during the process of opening an account by a user, it is necessary to check the uploaded ID picture to ensure the authenticity of the ID and to prevent fraudsters from using false IDs to access the system. The process of verifying the identity of a user is called Knowing Your Customer (KYC). In some countries, governments have official databases and open query interfaces. The merchant may connect to an official data source to verify the ID information. However, such an official database is not available in every country.

Generally, government issued ID cards typically have some security features, such as highlights, watermarks, and the like. Existing methods for detecting false IDs using Computer Vision (CV) techniques include: learning the security features of the ID card and developing a CV algorithm to detect whether the uploaded ID picture is real or false based on whether the ID picture has the security features. However, today fraudsters are able to fake ID pictures so well that the material, security features and content layout on ID cards are highly similar to real ID cards. As a result, these false IDs are difficult to distinguish even by the human eye, and the CV algorithm becomes insufficient to verify the authenticity of the ID and detect the false ID.

Disclosure of Invention

The described embodiments provide a method and system for identity verification by detecting false Identity Documents (IDs). In some embodiments, the method uses a spatial segmentation technique to determine the authenticity of the received new identity. The identity may include different attributes and values for different attributes, and the number of occurrences of these values within a particular time period may be calculated. The partitioning is performed for the number of occurrences of different conditions (e.g., by selecting different attributes of the identity) until the new identity is separated from the historical data of the identity. The number of divisions represents the degree of abnormality based on which the new identity can be determined as a normal identity or a false identity. In some implementations, the number of occurrences may be adjusted based on a text frequency dictionary before performing the segmentation. Advantageously, this adjustment takes into account the inherent differences in the frequency of occurrence of different texts or values in practice.

In some embodiments, the data for the identity includes several attributes, such as name, date of birth, ID number, weight, height, eye color, ID expiration date, and address. In some implementations, when the ID is received in the form of an image (e.g., a picture of the ID is uploaded through a web page or mobile phone application), an Optical Character Recognition (OCR) algorithm can be used to extract text related to the attribute and the value of the attribute. Further, the number of occurrences of the values of the different attributes within a time period is calculated and represented as a vector, referred to as a velocity vector, and a segmentation is performed to segment the velocity vector of the new identity from the entire vector space of velocity vectors of identities in the history data. The time period may be in the form of a sliding window that is traced back in time from now on. Advantageously, values that occur frequently over a past period of time (e.g., five minutes) may be captured and reflected by velocity vectors having high values, and since the number of splits required to separate high value velocity vectors is small, identities having frequently occurring values may be identified as anomalous.

According to one embodiment, a method for identity attestation is provided. The method comprises the following steps: training a spatial segmentation model associated with historical data of an identity, wherein the historical data comprises attributes of the identity, values of the attributes, and timestamps related to behavior of the identity; receiving data corresponding to a new identity, wherein the data comprises attributes of the new identity, values of each of the attributes, and timestamps associated with behaviors of the new identity; calculating, for the new identity and historical data of the identity, a number of occurrences of the value of the attribute between a first timestamp and a second timestamp; segmenting the new identity from the historical data of identities based on the number of occurrences using the spatial segmentation model, wherein the new identity is separated after a number of segmentations; and proving authenticity of the new identity based on the number of splits.

In some implementations, the number of occurrences of the value of the attribute between the first timestamp and the second timestamp can be represented as a vector. In some implementations, the number of occurrences may be adjusted based on a text frequency, and the first timestamp and the second timestamp may have a form of a sliding time window therebetween. In some implementations, the method can further include determining whether the received data includes data corresponding to an image of an identity document, and extracting the attribute and a value of the attribute from the image. In some embodiments, the historical data for the identity may be updated by including a new identity.

According to other embodiments, one or more of these general and specific embodiments may be implemented using an apparatus comprising a plurality of modules, systems, methods, or computer-readable media, or any combination of apparatus, systems, methods, and computer-readable media. The foregoing and other described embodiments may each optionally include some, none, or all of the following embodiments.

Drawings

The examples and embodiments, which are provided by way of example only, will be better understood and readily apparent to those of ordinary skill in the art from the following written description when read in conjunction with the accompanying drawings, wherein:

fig. 1 is a flow chart illustrating an example of a method for identity attestation, according to an embodiment.

Fig. 2 is a flowchart illustrating an example of an implementation of the method for identity attestation in fig. 1, according to an embodiment.

Fig. 3A is a schematic diagram of an example of normal point segmentation in a two-dimensional plane according to an embodiment.

Fig. 3B is a schematic diagram of an example of outlier segmentation in a two-dimensional plane, according to an embodiment.

Fig. 4 is a schematic diagram of an example of modules of a system for identity attestation, according to an embodiment.

Fig. 5 is a block diagram of an example of a computer system suitable for performing at least some of the steps of the example methods shown in fig. 1 and 2, according to an embodiment.

Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures, block diagrams, or flowcharts may be exaggerated relative to other elements to help improve understanding of the present embodiments.

Detailed Description

Embodiments will now be described, by way of example only, with reference to the accompanying drawings. Like reference numbers and designations in the drawings indicate like elements or equivalents.

Some portions of the description that follows are presented explicitly or implicitly in terms of algorithms and functional or symbolic representations of operations on data within a computer memory. These algorithmic descriptions and functional or symbolic representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, considered to be a self-consistent sequence of steps leading to a desired result. These steps are those requiring physical manipulations of physical quantities such as electrical, magnetic or optical signals capable of being stored, transferred, combined, compared, and otherwise manipulated.

Unless specifically stated otherwise, and as apparent from the following, it is appreciated that throughout the description, terms such as "receiving," "obtaining," "training," "determining," "partitioning," "computing," "generating," "detecting," "indicating," "converting," "adding," "adjusting," "comparing," "updating," "extracting," "representing," "proving," "authenticating," "outputting," or the like, are used to refer to the action and processes of a computer system, or similar electronic device, that manipulates and transforms data represented as physical quantities within the computer system into other data similarly represented as physical quantities within the computer system or other information storage, transmission, or display device.

The present specification also discloses an apparatus for performing the operations of the method. Such apparatus may be specially constructed for the required purposes, or may comprise a computer or other device selectively activated or reconfigured by a computer program stored in the computer. The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various machines may be used with programs in accordance with the teachings herein. Alternatively, the construction of more specialized apparatus to perform the required method steps may be appropriate. The structure of a computer adapted to perform the various methods/processes described herein will appear from the description below.

Further, the present specification implicitly discloses a computer program, as it is apparent to a person skilled in the art that each step of the method described herein may be implemented by computer code. The computer program is not intended to be limited to any particular programming language and implementation thereof. It will be appreciated that a variety of programming languages and code therefor may be used to implement the teachings of the specification contained herein. Moreover, the computer program is not intended to be limited to any particular control flow. There are many other variations of computer programs and different control flows may be used without departing from the scope of the invention.

Furthermore, one or more steps of a computer program may be executed in parallel rather than sequentially. Such a computer program may be stored on any computer readable medium. The computer readable medium may include, for example, a magnetic or optical disk, a memory chip, or other storage device suitable for interfacing with a computer. The computer readable media may also include hardwired media such as those illustrated in the internet system, or wireless media such as those illustrated in the GSM mobile phone system. When the computer program is loaded and executed on such a computer, it effectively creates means for implementing the steps of the preferred method.

This description may also be implemented as a hardware module. More specifically, in a hardware sense, a module is a functional hardware unit designed for use with other components or modules. For example, a module may be implemented using discrete electronic components, or it may form part of an overall electronic circuit, such as an Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA). There are many other possibilities. Those skilled in the art will appreciate that the system may also be implemented as a combination of hardware and software modules.

Proof of identity is an act or process of verifying the authenticity of an identity and may be considered a form of fraud detection or false identity detection in which the user's legitimacy is verified and a potential fraudster may be detected before the fraudulent act is performed. A valid identification may enhance the data security of the system by allowing only authenticated users to access protected resources of the system. Embodiments seek to provide methods and systems for identity verification that detect fraudulent ID information or images uploaded by fraudsters. Advantageously, financial risks such as money laundering and fraud may be effectively reduced or eliminated.

The techniques described herein produce one or more technical effects. In some embodiments, a spatial segmentation technique based on values of identity attributes and identity behavior is used for false ID detection and identity attestation, which is effective in detecting pictures of false IDs that highly mimic real IDs. In some embodiments, the frequency of occurrence of the values of the different identity attributes over a period of time is calculated and converted into vectors for performing the spatial partitioning. The vector may reflect the degree of abnormality of the ID, which helps to identify values that occur frequently within the time period and to determine the authenticity of the identity. In some embodiments, this time period may be in the form of a sliding window backwards in time from now on, which may identify fake ID pictures that were frequently uploaded by fraudsters over the past time period.

Fig. 1 is a flow chart 100 illustrating a method for identity authentication, comprising the steps of:

-110: training a spatial segmentation model associated with historical data of an identity, wherein the historical data comprises attributes of the identity, values of the attributes, and timestamps related to behavior of the identity;

-120: receiving data corresponding to a new identity, wherein the data comprises attributes of the new identity, values of each of the attributes, and timestamps associated with behaviors of the new identity;

-125: calculating, for the new identity and historical data of the identity, a number of occurrences of the value of the attribute between the first timestamp and the second timestamp;

-130: segmenting the new identity from the historical data of identities based on the number of occurrences using the spatial segmentation model, wherein the new identity is separated after a number of segmentations;

-140: proving authenticity of the new identity based on the number of splits; and

-150: updating the historical data of the identity by including the new identity.

At step 110, a spatial segmentation model associated with historical data of an identity is trained. In some embodiments, the trained spatial segmentation model may be a tree model. The historical data of the identity may include attributes of the identity and values of the attributes. Attributes of an identity may include some or all of the following: name, date of birth, address, height, weight, country/region, Identification Document (ID) number, etc. Depending on the attribute, the value of the attribute may be in the form of text (e.g., if the attribute is a name) or a number (e.g., if the attribute is a birth date). The historical data of the identity may also include data related to identity behavior, such as timestamps related to user behavior. These actions may include opening an account, registering, updating identity information, logging in, logging out, and proof of identity such as KYC. Such identity behavior data is important in identification and detection of false IDs, as fraudsters may behave differently than normal users (e.g., uploading an identity with repeated information or performing multiple KYC attempts in a short time).

At step 120, data corresponding to the new identity is received. The data may include attributes of the new identity, values of the attributes of the new identity, and identity behavior data (e.g., timestamps) related to the behavior of the new identity. In some embodiments, the data corresponding to the new identity may be received as image data, for example, a picture of an ID card or an ID page. The picture may be uploaded through a web page or a mobile application. In some embodiments, first, a picture of an ID card or ID page is examined to determine whether the picture includes an image of an ID. For example, if a picture of an animal is uploaded, the picture will be determined to be non-conforming and processing will terminate. In some embodiments, an ID classification algorithm may be used to determine whether the image data is acceptable.

In some embodiments, once the image data is examined for images containing IDs and determined to be acceptable, an algorithm may be used to extract attributes and values of the attributes of the new identity from the image data. In some implementations, the algorithm can include an OCR algorithm.

At step 125, the number of occurrences of the value of the attribute between the first timestamp and the second timestamp is calculated for the new identity and the historical data of the identity. In an embodiment, the calculation may be performed for all attributes in the data or for the value of the selected attribute. The first and second time stamps may be set according to a time period of interest. In some embodiments, the time period of interest may take the form of a sliding window that is backtracked in time from now on, e.g., the last five minutes. In this way, the second timestamp may be set to the current time, and the first timestamp may be set to the current first five minutes. In some implementations, the number of occurrences of the value of the attribute for each identity between two timestamps can be calculated and expressed as a vector. Advantageously, the vector may reflect the degree of abnormality of the identity based on the frequency with which the value of the attribute of the identity occurs over a period of time.

At step 130, the new identity is segmented from the historical data of identities using the spatial segmentation model developed at step 110 and based on the number of occurrences calculated at step 125. By inputting a new identity (e.g., a vector representing the number of occurrences of a value of an attribute in the new identity) into the algorithm of the spatial segmentation model and performing segmentation for different attributes, the new identity can be isolated after a number of segmentations. At step 140, the authenticity of the new identity may be proven based on the number of splits. Some examples of separating new identities by spatial partitioning will be described in more detail with reference to fig. 3A and 3B.

Optionally, at step 150, the new identity may be added to the historical data for the identity, and the historical data may be updated to include the received data corresponding to the new identity. Advantageously, the historical data may include new identities, associated attributes of the new identities, values of the attributes, and behavioral data of the new identities, which may enhance future training models and improve accuracy.

Fig. 2 is a flow chart 200 illustrating an embodiment of the method for identity attestation described above with reference to fig. 1. At the beginning of the process, a picture 210 of an ID card or ID page is received. An ID classification algorithm may be used to determine whether the picture 210 is an ID (i.e., not a photograph or other unacceptable picture of an animal). If it is determined that the picture 210 is an ID card or ID page, processing continues. If not, the process terminates.

In the next step, OCR algorithms may be used to extract text related to personal information on the ID card or ID page, such as name, ID number, address, date of birth, etc. Such attributes of an identity are shown as "fields" and the value of an attribute is shown as "value". Subsequently, the frequency of occurrence of each value over a predetermined time period (i.e., between two predetermined time stamps) can be calculated and the number of occurrences of the value can be generated as a vector, referred to as a velocity vector. In some embodiments, the velocity vector is configured to: d (velocity vector) — [ V (name), V (date of birth), V (address), V (ID number), … ], by combining the velocities of each text item to form a velocity vector specific to this identity. As shown in fig. 2, if the name "James" occurs 3 times, the birth date "8/1/1990" occurs 1 times, the address (or state) "kansas" occurs 1 time, and the ID number "123456" occurs 7 times within a predetermined time period, the generated velocity vector will be [3, 1, 1, 7 ]. In some embodiments, the predetermined time period may take the form of a sliding window, which is a stream pattern that allows matching events within a time period that is retrospectively from now on. For example, a two minute sliding window includes all events that occurred within the past two minutes. In practice, identity information has a low likelihood of duplication, and an identity can only be used once when registering an account. Thus, if a particular identity field value occurs too frequently, it may be considered an exception. In real life, embodiments of the frequency of occurrence and sliding window logic may advantageously detect fraudster behavior, as fraudsters tend to upload a batch of false IDs within a certain period of time.

After generating the velocity vector representing the new identity received in the ID card picture 210, the velocity vector is input into the algorithm based on spatial segmentation. The algorithm may cause the segmentation to be performed under different conditions until the velocity vector is spatially separated from the entire vector. The algorithm may output a number of splits, which may indicate the degree of anomaly of the velocity vector corresponding to the new identity. In some implementations, the values in the velocity vector may be adjusted based on a text frequency dictionary. This adjustment advantageously takes this factor into account since in practice the frequency of occurrence of different texts is inherently different, making the number of segmentations in the output more reliable. For example, if one name appears frequently in practice, such as "smith," the corresponding value in the velocity vector is inherently higher than the other names, which may result in the velocity vector being identified as an outlier. Such erroneous output can be reduced if a text frequency dictionary is used to adjust the values in the velocity vector V (name). Finally, the adjusted velocity vector is separated based on the number of splits, the received new identity will be determined to be normal or abnormal.

An exemplary spatial segmentation algorithm or separation algorithm is described below. After generating the velocity vector for the new identity, the velocity vector is mapped to a point on the high dimensional space by a suitable method, and then the velocity vector is segmented using different hyperplanes (straight lines in the two dimensional space) until the individual subspace contains only this point.

The algorithm is as follows: SpaceSegment (x, T, e)

Inputting: x-example, e-current number of segmentations, T-trained tree model

And (3) outputting: number of y-divisions

1.If x is isolated then

2.Return y

3.End if

4.a＝T.splitAttr

5.If x.a<T.a.splitValue then

6.Return SpaceSegment(x,T.left,e+1)

7.Else x.a>＝T.a.splitValue then

8.Return SpaceSegment(x,T.right,e+1)

9.End if

As shown in the algorithm, for a new input x corresponding to a new identity, the identity may include different attributes attribute 1, attribute 2, attribute 3 …. Similarly, based on the historical data of the identity, there are the same attributes within the trained tree model T, and x will be segmented based on these attributes. For a tree model, the segmentation process can be viewed as the process of placing x at the top of the tree T and then descending layer by layer to the bottom of the tree T. For each separation, an attribute is randomly selected from [ attribute 1, attribute 2, attribute 3 … ] as the split attribute, and a value is randomly selected within the value range of the number of occurrences of the attribute as the split value. The split value is compared to the attribute values in the velocity vector corresponding to x. Depending on the comparison, x will fall into either the left or right subtree of the layer and then proceed to the next split until x falls to a leaf node, which means the splitting process is complete. The number of splits will be returned as output.

Fig. 3A is a schematic diagram 300 of the segmentation of the normal points in the two-dimensional plane, and fig. 3B is a schematic diagram 350 of the segmentation of the outliers in the two-dimensional plane. For example, the two-dimensional plane represents the number of occurrences (i.e., speed) of the values of the two selected attributes, i.e., name and ID number, between the two timestamps. Referring to FIG. 3A, point X _ normal has a velocity vector value [2,1], which is a common value in this group because V (name) is low for most identities, while V (ID number) is 1. As a result, the point X _ normal is mixed within many adjacent points having similar values. When performing spatial segmentation, 11 segmentations are required to separate the points X _ normal. Referring to fig. 3B and using the previous example in fig. 2, point X _ anomaly has a velocity vector value [3,7], which is a relatively small value in the set. As a result, point X _ analysis is far from most points and only 4 partitions are needed to separate point X _ analysis. It will be appreciated that although an example of a two-dimensional plane is given, the spatial segmentation technique may be applied to a multi-dimensional space, and a point may be separated or divided by a plurality of hyperplanes, in which case a plurality of attributes are selected.

According to the above embodiments, it can be understood that the method can convert the picture of the ID card or the ID page into a speed factor reflecting the degree of identity abnormality, and convert the problem of detecting false ID by computer vision into a space division method, which is a completely different angle.

FIG. 4 is a schematic diagram of an exemplary system 400 that includes modules for identification. The system 400 includes at least a processor module 402 and a memory module 404. The processor module 402 and the memory module 404 are interconnected. The memory module 404 stores historical data 420 and computer program code (not shown in FIG. 4). The memory module 404 and the computer program code are configured to, with the processor module 402, cause the system 400 to perform steps for identity attestation as described herein. The system 400 may include a receiver module 406, a training module 408, a generator module 410, a computation module 412, an output module 416, and an update module 418. Referring to fig. 1 and 2, the receiver module 406 may be configured to receive data corresponding to an identity. The training module 408 may be configured to train a spatial segmentation model, an ID classification model, and an OCR model. The calculation module 412 may be configured to calculate the number of occurrences of the attribute value of the identity within a predetermined period of time. The generator module 410 may be configured to generate a vector and a vector space based on the calculated number of occurrences. The output module 416 may be configured to output the number of splits and/or the result if the new identity is determined to be a fraudulent identity. The update module 418 may be configured to update the historical data 420 by including the new identity received by the receiver module 406. One or more or any combination of these modules may be part of a device for identity authentication.

The systems, apparatuses, modules or units shown in the previous embodiments may be implemented by using a computer chip or an entity, or may be implemented by using an article having a specific function. The device of a typical embodiment is a computer (and the computer may be a personal computer), a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email receiving and sending device, a gaming console, a tablet computer, a wearable device, or any combination of these devices. Modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, may be located in one location, or may be distributed across multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the objectives of the solution herein. Those of ordinary skill in the art will understand and appreciate the embodiments of the present application without inventive effort.

Fig. 5 is a block diagram of an example computer system 500 suitable for performing at least some of the steps of the example methods shown in fig. 1 and 2. The following description of computer system/computing device 500 is provided by way of example only and is not intended to be limiting.

As shown in fig. 5, the exemplary computing device 500 includes a processor 502 for executing software routines. Although single processors are shown for clarity, computing device 500 may also comprise a multi-processor system. The processor 502 is connected to a communication facility 506 for communicating with other components of the computing device 500. The communication facilities 506 may include, for example, a communication bus, cross-bar, or network.

Computing device 500 also includes a main memory 508, such as Random Access Memory (RAM), and a secondary memory 510. The secondary memory 510 may include: for example, storage drive 512, which may be a hard disk drive, a solid state drive, or a hybrid drive; and/or a removable storage drive 514 that may comprise a magnetic tape drive, an optical disk drive, a solid state storage drive (e.g., a USB flash drive, a flash memory device, a solid state drive, or a memory card), etc. Removable storage drive 514 reads from and/or writes to a removable storage medium 518 in a well known manner. Removable storage media 518 may include magnetic tape, optical disk, non-volatile memory storage media, etc. which is read by and written to by removable storage drive 514. As will be appreciated by one skilled in the relevant art, removable storage medium 518 includes a computer-readable storage medium having stored therein computer-executable program code instructions and/or data.

In alternative embodiments, secondary memory 510 may additionally or alternatively include other similar means for allowing computer programs or other instructions to be loaded into computing device 500. Such means may include, for example, a removable storage unit 522 and an interface 520. Examples of removable storage units 522 and interfaces 520 include a program cartridge and cartridge interface (e.g., a cartridge interface built into video game machine devices), a removable memory chip (e.g., an EPROM, or PROM) and associated socket, a removable solid state memory drive (e.g., a USB flash drive, flash memory device, solid state drive, or memory card), and other removable storage units 522 and interfaces 520 that allow software and data to be transferred from removable storage unit 522 to computer system 500.

Computing device 500 also includes at least one communication interface 524. Communications interface 524 allows software and data to be transferred between computing device 500 and external devices via communications path 526. In various embodiments herein, communication interface 524 allows data to be transferred between computing device 500 and a data communication network, such as a public or private data communication network. The communication interface 524 may be used to exchange data between different computing devices 500, which computing devices 500 form part of an interconnected computer network. Examples of communication interface 524 may include a modem, a network interface (e.g., an ethernet card), a communication port (e.g., serial, parallel, printer, GPIB, IEEE 1394, RJ45, USB), an antenna with associated circuitry, and so forth. Communication interface 524 may be wired or wireless. Software and data transferred via communications interface 524 are in the form of signals which may be electronic, electromagnetic, optical or other signals capable of being received by communications interface 524. These signals are provided to the communications interface via communications path 526.

As shown in fig. 5, computing device 500 also includes a display interface 528 that performs operations for rendering images to an associated display 530, and an audio interface 532 that performs operations for playing audio content via associated speakers 534.

As used herein, the term "computer program product" may refer, in part, to removable storage media 518, removable storage units 522, a hard disk installed in hard disk drive 512, or a carrier wave that carries software to communication interface 524 through communication path 526 (wireless link or cable). Computer-readable storage media refers to any non-transitory, non-volatile tangible storage medium that provides recorded instructions and/or data to computing device 500 for execution and/or processing. Examples of such storage media include magnetic tape, CD-ROM, DVD, Blu-ray^TMA disk, hard drive, ROM or integrated circuit, solid state storage drive (e.g., USB flash drive, flash device, solid state drive, or memory card), hybrid drive, magneto-optical disk, or computer readable card such as a PCMCIA card, whether or not such devices are internal or external to computing device 500. Examples of transitory or non-tangible computer-readable transmission media that may also participate in providing software, applications, instructions, and/or data to computing device 500 include radio or infraredTransmission channels and network connections to another computer or networked device, and the internet or intranets, including email transmissions and recorded information on websites and the like.

Computer programs (also called computer program code) are stored in main memory 504 and/or secondary memory 510. Computer programs may also be received via communications interface 524. Such computer programs, when executed, enable computing device 500 to perform one or more features of embodiments discussed herein. In various embodiments, the computer programs, when executed, enable the processor 502 to perform the features of the embodiments described above. Accordingly, such computer programs represent controllers of the computer system 500.

The software may be stored in a computer program product and loaded into computing device 500 using removable storage drive 514, storage drive 512, or interface 520. The computer program product may be a non-transitory computer readable medium. Alternatively, the computer program product may be downloaded to computer system 500 via communications path 526. The software, when executed by the processor 502, causes the computing device 500 to perform the necessary operations to perform the methods as illustrated in fig. 1 and 2.

It should be understood that the embodiment of fig. 5 illustrates, by way of example only, the operation and structure of the system 500. Thus, in some embodiments, one or more features of computing device 500 may be omitted. Furthermore, in some embodiments, one or more features of computing device 500 may be combined together. Additionally, in some embodiments, one or more features of computing device 500 may be separated into one or more component parts.

It should be understood that the elements shown in fig. 5 are intended to provide means for performing the various functions and operations of the system described in the embodiments above.

It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the present description, as shown in the specific embodiments, without departing from the scope of the present description as broadly described. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.

Claims

1. A computer-implemented method for identity attestation, the method comprising:

training a spatial segmentation model associated with historical data of an identity, wherein the historical data comprises attributes of the identity, values of the attributes of the identity, and timestamps related to behavior of the identity;

receiving data corresponding to a new identity, wherein the data comprises an attribute of the new identity, a value of the attribute of the new identity, and a timestamp relating to a behavior of the new identity;

calculating, for the historical data of the identity and the new identity, a number of occurrences of the value of the attribute between a first timestamp and a second timestamp;

segmenting the new identity from the historical data of the identity based on the number of occurrences using the spatial segmentation model, wherein the new identity is isolated after a number of segmentations; and

based on the number of splits, the authenticity of the new identity is proven.

2. The method of claim 1, wherein a number of occurrences of the value of the attribute between the first timestamp and the second timestamp is represented as a vector.

3. The method of claim 1, wherein the number of occurrences is adjusted based on a text frequency.

4. A method according to any of the preceding claims 1 to 3, wherein the first and second timestamps have the form of a sliding time window in between.

5. The method of any preceding claim 1 to 3, further comprising:

determining whether the received data includes data corresponding to an image of an identity document; and

in response to the received data being determined to include data corresponding to an image of the identity document, extracting from the image an attribute of the new identity and a value of the attribute of the new identity.

6. The method of claim 5, wherein determining whether the received data includes data corresponding to an image of the identity document is based on a classification algorithm.

7. The method of claim 5, wherein the attribute of the new identity and the value of the attribute of the new identity are extracted from the image based on an Optical Character Recognition (OCR) algorithm.

8. The method of any preceding claim 1 to 3, wherein the spatial segmentation model comprises a tree model.

9. The method of any preceding claim 1 to 3, further comprising:

updating the historical data of the identity by including the new identity.

10. The method according to any of the preceding claims 1 to 3, wherein the attributes of the identity are selected from the group comprising name, date of birth, place of birth, height, weight, identification card number and address.

11. The method according to any of the preceding claims 1 to 3, wherein the behavior of the identity is selected from the group comprising registration, updating information, login, logout and proof of identity.

12. The method of claim 11 wherein said identification includes learning your customer KYC processing.

13. A system for identity authentication, the system comprising:

one or more processors; and

one or more computer-readable memories coupled to the one or more processors and having instructions stored thereon that are executable by the one or more processors to perform the method of any of claims 1-12.

14. An apparatus for identity authentication, the apparatus comprising a plurality of modules for performing the method of any one of claims 1 to 12.