WO2019143360A1 - Data security using graph communities - Google Patents

Data security using graph communities Download PDF

Info

Publication number
WO2019143360A1
WO2019143360A1 PCT/US2018/014550 US2018014550W WO2019143360A1 WO 2019143360 A1 WO2019143360 A1 WO 2019143360A1 US 2018014550 W US2018014550 W US 2018014550W WO 2019143360 A1 WO2019143360 A1 WO 2019143360A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
community
nodes
communities
authentication
Prior art date
Application number
PCT/US2018/014550
Other languages
French (fr)
Inventor
Theodore Harris
Tatiana KOROLEVSKAYA
Yue Li
Original Assignee
Visa International Service Association
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Visa International Service Association filed Critical Visa International Service Association
Priority to PCT/US2018/014550 priority Critical patent/WO2019143360A1/en
Publication of WO2019143360A1 publication Critical patent/WO2019143360A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3226Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using a predetermined code, e.g. password, passphrase or PIN
    • H04L9/3231Biological data, e.g. fingerprint, voice or retina
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/321Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving a third party or a trusted authority

Definitions

  • Client devices e.g., personal computers, smartphones, etc.
  • Client devices can be used to access a variety of secure resources such as user accounts, information databases, file storage, website logins, and digital wallets.
  • resource managers may require authentication information of the user or the client device to be validated before granting access to the resource.
  • a resource manager of an online storage system may require a device’s network address to be validated, in addition to verifying the user’s account name and password.
  • Client devices can send access request messages including authentication information to request access to a resource.
  • the resource may be managed by a resource management computer and the authentication information may be validated by an authentication data processing server.
  • a data security hub e.g., an authentication hub
  • the data security hub can determine a risk score for a particular access request and use that risk score in an authentication process.
  • the data security hub can determine the risk score based on whether interaction data for the current access request matches expected interaction data using a community predictive model. For example, the data security hub can determine whether the IP address used to send the authentication request falls within an expected community of IP addresses.
  • the data security hub can also use the community predictive model to determine a subset of registered accounts that a current authentication request might match.
  • the data security hub can receive an access request including biometric data and can apply interaction data for the access request to the community predictive model to determine a subset of registered accounts that the received biometric data might match. Accordingly, the received biometric data does not need to be compared against the entire set of biometric data for registered accounts (which could potentially lead to a false positive due to the large set for comparison), but only to the reduced set of accounts.
  • Some embodiments of the disclosure provide methods for processing access request messages through a data security hub.
  • the method includes storing interaction data for a plurality of previous access requests.
  • the previous access requests can also be stored.
  • the interaction data can include information about the geolocation, date, and time of day that the previous access requests were made as well as the network address and network route used to send the access request.
  • the interaction data can also include data or parameters included in the access request, such as an identifier the resource manager that the request is for, an identifier of the resource, an amount of the resource.
  • the method further includes creating a topological graph based on the interaction data.
  • the topological graph including nodes and edges connecting the nodes.
  • the topological graph can be created using an unsupervised clustering algorithm. For example, a first node associated with a particular Internet Protocol (IP) address can be connected by an edge to a second node associated with a geolocation (e.g., a zip code or Global Positioning System coordinates) based on previous access requests made in that geolocation being sent using the particular IP address.
  • IP Internet Protocol
  • the method further includes determining a plurality of communities from the topological graph to form a predictive model.
  • Each community of the plurality of communities includes a subset of the nodes in the topological graph. A portion of one community can overlap a portion of another community. That is, the same node can be included in multiple communities.
  • Interaction data for a new access request received by the data security hub in real-time can be applied to the predictive model in order to determine a set of communities that the new access requests fits within. This can be done by vectorizing each of the communities and interaction data for the new access request and then determining whether a vector distance between the vector for the new access request and a vector for a community is within a predetermined range.
  • the method further includes receiving, from a client device, an access request.
  • the access request can be received in real-time by the data security hub.
  • the method includes determining request interaction data for the access request.
  • the request interaction data can indicate one or more of the geolocation of the client device, the time of day of the access request (e.g., a specific time or a time range), a date of the access request, an identifier of a resource, a resource amount, an identifier of the resource manager of the resource or a resource management computer, an IP address of the client device, and a network route or path used for sending the access request to the data security hub.
  • the method further includes determining a predicted subset of the plurality of communities by applying the request interaction data to the predictive model.
  • the predicted communities in the subset can be determined by comparing the request interaction data to the communities in the topological graph. For example, an IP address of the client device can be used to predict communities that include nodes associated with that IP address.
  • the method further includes initiating an authentication process for the access request using the predicted subset of the plurality of communities.
  • the method further includes providing an authentication response to the client device where the authentication response is based on the authentication process for the access request.
  • FIG. 1 shows a system diagram of an authentication hub in communication with client devices, data processing servers, and resource managemnet computers, in accordance with some embodiments.
  • FIG. 2 shows a functional block diagram of an authentication hub, in accordance with some embodiments.
  • FIG. 3 shows an information flow diagram of an authentication process using community prediction, in accordance with some embodiments.
  • FIG. 4 shows an exemplary topological graph and communities, in accordance with some embodiments.
  • FIG. 5 shows a flow chart of an exemplary method for processing access request messages through a data security hub, in accordance with some embodiments.
  • Client devices such as personal computers, smartphones, tablets, and wearable devices can be used to request access to resources.
  • a client device can request access to login to a local or remote user account, gain permissions to files or settings, retrieve or store information in a database, make payment transactions, or gain access to a physical structure.
  • a client device can send an access request message to the resource management computer associated with the requested resource.
  • resource management computers may require client devices to provide authentication information of the user or the client device (e.g., personal or sensitive information). Accordingly, client devices can send access request messages including authentication information to request access to a resource.
  • a data security hub e.g., an authentication hub
  • the authentication hub can create a predictive model based on interaction data for previous access requests.
  • the predictive model can be used to determine a set of predicted community for an incoming access request.
  • the authentication hub can determine a risk score for the incoming access request based on whether predicted communities based on interaction data for that access request match expected communities based on historical data.
  • the data security hub can determine whether the IP address used to send the authentication request falls within an expected community of IP addresses normally used to request access to a particular resource. Accordingly, the authentication hub may be able to determine that the access request should not be authenticated, even without sending the authentication information to a data processor to be verified.
  • the data security hub can also use the community predictive model to determine a subset of registered accounts that a current authentication request might match. For instance, the data security hub can receive an access request including biometric data and can apply interaction data for the access request to the community predictive model to determine a subset of registered accounts that the received biometric data might match. Accordingly, the received biometric data does not need to be compared against the entire set of biometric data for registered accounts (which could potentially lead to a false positive), but only to the reduced set of accounts.
  • the authentication hub can maintain resource security while speeding up the authentication process and using fewer computing resources.
  • A“client device” or“user device” may include any device that can be operated by a user.
  • a client device or user device can provide electronic communication with one or more computers.
  • a communication device can be referred to as a mobile device if the mobile device has the ability to communicate data portably.
  • A“mobile device” may comprise any suitable electronic device that may be transported and operated by a user, which may also provide remote communication capabilities over a network.
  • Examples of remote communication capabilities include using a mobile phone (wireless) network, wireless data network (e.g. 3G, 4G or similar networks), Wi-Fi, Wi-Max, or any other communication medium that may provide access to a network such as the Internet or a private network.
  • Examples of mobile devices include mobile phones (e.g. cellular phones), PDAs, tablet computers, net books, laptop computers, personal music players, hand-held specialized readers, etc. Further examples of mobile devices include wearable devices, such as smart watches, fitness bands, ankle bracelets, etc., as well as automobiles with remote communication capabilities.
  • a mobile device may comprise any suitable hardware and software for performing such functions, and may also include multiple devices or components (e.g. when a device has remote access to a network by tethering to another device - i.e. using the other device as a modem - both devices taken together may be considered a single mobile device).
  • a mobile device may further comprise means for determining/generating location data.
  • a mobile device may comprise means for communicating with a global positioning system (e.g. GPS).
  • An“application” may be computer code or other data stored on a computer readable medium (e.g. memory element or secure element) that may be executable by a processor to complete a task.
  • a computer readable medium e.g. memory element or secure element
  • An“access request message” refers to a message sent by a client device to request access to a resource.
  • the resource may be managed by a particular resource management computers.
  • the client device may send the access request message to an authentication hub, which may authenticate the access request message prior to sending it to the corresponding resource management computer.
  • the access request message can include authentication information that may be validated by either the authentication hub or a data processing server.
  • Different types of client devices may generate access requests according to different APIs or protocols.
  • Authentication information may be information that can be used to authenticate a user or a client device. That is, the authentication information may be used to verify the identity of the user or the client device. In some embodiments, the user may input the authentication information into a device during an authentication process.
  • authentication information examples include biometric data (e.g., fingerprint data, facial recognition data, 3-D body structure data, deoxyribonucleic acid (DNA) data, palm print data, hand geometry data, retinal recognition data, iris recognition data, voice recognition data, etc.), passwords, passcodes, personal identifiers (e.g., government issued licenses or identifying documents), personal information (e.g., address, birthdate, mother’s maiden name, or phone number), and other secret information (e.g., answers to security questions).
  • biometric data e.g., fingerprint data, facial recognition data, 3-D body structure data, deoxyribonucleic acid (DNA) data, palm print data, hand geometry data, retinal recognition data, iris recognition data, voice recognition data, etc.
  • passwords e.g., passwords, passcodes, personal identifiers (e.g., government issued licenses or identifying documents), personal information (e.g., address, birthdate, mother’s maiden name, or phone number), and
  • Authentication information can also include data provided by the device itself, such as hardware identifiers (e.g., an International Mobile Equipment Identity (IMEI) number or a serial number), a network address (e.g., internet protocol (IP) address), interaction information, and Global Positioning System (GPS) location information).
  • hardware identifiers e.g., an International Mobile Equipment Identity (IMEI) number or a serial number
  • IP internet protocol
  • interaction information e.g., interaction information
  • GPS Global Positioning System
  • A“limited set of authentication information” or a“restricted set of authentication information” refers to a set of authentication information that has been limited or restricted from the set of authentication information received in an access request.
  • the authentication information may be limited in that certain types of authentication information are removed, portions or a certain piece of authentication information is removed or obscured, or, some or all of a certain piece of authentication information is obfuscated, while still being able to be validated.
  • more sensitive authentication information e.g., authentication information determined to have a higher sensitivity level
  • more sensitive authentication information may be obfuscated (e.g., using secure multi-party computation techniques).
  • A“linguistic parser” refers to an artificial intelligence algorithm for processing natural language to determine the elements, relationships, and grammatical structure of sentences (e.g., which characters are words, whether a word is a noun or verb, or which words is the subject or object of a particular verb).
  • An authentication hub may use a linguistic parsing algorithm to build data structures that represent the API of a particular access request message, similar to how linguistic parsing algorithms may be used to represent the structure of a sentence in a natural language. In parsing an access request message, the authentication hub may first create a sequence of symbols or tokens corresponding to the API, protocol, or format of the access request message.
  • the symbols/tokens can correspond to the data fields of the access request (e.g., routing information, authentication information, metadata, etc.). Then the authentication hub can create a data structure (e.g., a parse tree or a syntax tree) that represents the API of the access request.
  • a data structure e.g., a parse tree or a syntax tree
  • A“resource manager” can be any entity that provides resources. Examples of a resource managers include a website operator, a data storage provider, an internet service provider, a merchant, a bank, a building owner, a governmental entity, etc. Any entity that maintains accounts for users or that can provide information, data, or physical objects to users may be considered a“resource manager.”
  • the resource manager may operate a resource management computer to perform functions for maintaining accounts and controlling access to resources.
  • An“access device” may be any suitable device that provides access to a remote system.
  • An access device may also be used for communicating with a resource
  • An access device may generally be located in any suitable location, such as at the location of a merchant.
  • An access device may be in any suitable form.
  • Some examples of access devices include POS or point of sale devices (e.g., POS terminals), cellular phones, PDAs, personal computers (PCs), tablet PCs, hand-held specialized readers, set-top boxes, electronic cash registers (ECRs), automated teller machines (ATMs), virtual cash registers (VCRs), kiosks, security systems, access systems, and the like.
  • An access device may use any suitable contact or contactless mode of operation to send or receive data from, or associated with, a user mobile device.
  • an access device may comprise a POS terminal
  • any suitable POS terminal may be used and may include a reader, a processor, and a computer-readable medium.
  • a reader may include any suitable contact or contactless mode of operation.
  • exemplary card readers can include radio frequency (RF) antennas, optical scanners, bar code readers, or magnetic stripe readers to interact with a payment device and/or mobile device.
  • RF radio frequency
  • a cellular phone, tablet, or other dedicated wireless device used as a POS terminal may be referred to as a mobile point of sale or an “mPOS” terminal.
  • An“application programming interface” refers to a set of routines and protocols defining how software components should communicate and interact.
  • an API for requesting access to resources can define the format and protocol of access request messages, including the required data fields.
  • An API for requesting access to resources can also define what types of authentication information should, or should not, be included in the access request message.
  • Interaction information may include data on the type of interaction being conducted by the user of a client device (e.g., tracking the user’s use of the client device).
  • Interaction information can include, for example, the type and amount of resources requested in an access request (specific amount or a range of amounts), the date of the access request, the time of day (specific or within a range) that the access request was made, the geo-location of the client device making the access request, the network address of the client device, a network route or path used to send the access request, and an identifier of the resource management computer that manages the request resource.
  • the interaction information may be used as part of a risk evaluation of the client device, where different interactions may have different levels of risk associated with them.
  • the interaction information may be tracked in a log file that is provided in an access request.
  • A“topological graph” may refer to a representation of a graph in a plane of distinct vertices connected by edges.
  • the distinct vertices in a topological graph may be referred to as“nodes.”
  • Each node may represent specific information for an event or may represent specific information for a profile of an entity or object.
  • the nodes may be related to one another by a set of edges, E.
  • An edge may be associated with a numerical value, referred to as a“weight”, that may be assigned to the pairwise connection between the two nodes.
  • the edge weight may be identified as a strength of connectivity between two nodes and/or may be related to a cost or distance, as it often represents a quantity that is required to move from one node to the next.
  • A“community” may refer to a group/collection of nodes in a graph that are densely connected within the group.
  • a community may be a subgraph or a portion/derivative thereof and a subgraph may or may not be a community and/or comprise one or more communities.
  • a community may be identified from a graph using a graph learning algorithm, such as a graph learning algorithm for mapping protein complexes.
  • a graph learning algorithm such as a graph learning algorithm for mapping protein complexes.
  • communities identified using historical data can be used to classify new data for making predictions. For example, identifying communities can be used as part of a machine learning process, in which predictions about information elements can be made based on their relation to one another.
  • A“data set” may refer to a collection of related sets of information composed of separate elements that can be manipulated as a unit by a computer.
  • a data set may comprise known data, which may be seen as past data or“historical data.” Data that is yet to be collected, may be referred to as future data or“unknown data.” When future data is received at a later point it time and recorded, it can be referred to as“new known data” or “recently known” data, and can be combined with initial known data to form a larger history.
  • “Unsupervised learning” may refer to a type of learning algorithm used to classify information in a dataset by labeling inputs and/or groups of inputs.
  • One method of unsupervised learning can be cluster analysis, which can be used to find hidden patterns or grouping in data. The clusters may be modeled using a measure of similarity, which can defined using one or metrics, such as Euclidean distance.
  • Machine learning may refer to an artificial intelligence process in which software applications may be trained to make accurate predictions through learning.
  • the predictions can be generated by applying input data to a predictive model formed from performing statistical analysis on aggregated data.
  • a clustering algorithm is an example of a machine learning algorithm.
  • a predictive model can be trained using training data, such that the model may be used to make accurate predictions.
  • the prediction can be, for example, a classification of an image (e.g. identifying images of cats on the Internet) or as another example, a recommendation (e.g. a movie that a user may like or a restaurant that a consumer might enjoy).
  • Training data may be collected as existing records. Existing records can be any data from which patterns can be determined from.
  • Existing records may be, for example, user data collected over a network, such as user browser history or user spending history.
  • training may be performed through a learning module.
  • the learning module may comprise a learning algorithm, which may be used to build a model.
  • the model may be a statistical model, which can be used to predict unknown information from known information.
  • the learning module may be a set of instructions for generating a regression line from training data (supervised learning) or a set of instructions for grouping data into clusters of different classifications of data based on similarity, connectivity, and/or distance between data points (unsupervised learning).
  • the regression line or data clusters can then be used as a model for predicting unknown information from known information.
  • the model may be used to generate a predicted output from a new request.
  • New request may be a request for a prediction associated with presented data.
  • a new request may comprise the data that a prediction is requested for.
  • new request may comprise pixel data for an image that is to be classified or may comprise user information (e.g. name, location, user history, etc.) that can be used to determine an appropriate recommendation.
  • the data included in new request can be compared against model. For example, the position of data received in new request on a graph can be compared against a regression line to predict its next state (i.e. according to a trend). In another example, the position of the data as plotted on a topological graph can be used to determine its classification (e.g. predicting tastes and preferences of a user based on his or her online interactions).
  • A“computing device” may be any suitable electronic device that can process and communicate information to other electronic devices.
  • the computing device may include a processor and a computer readable medium coupled to the processor, the computer readable medium comprising code, executable by the processor.
  • the computing device may also include input devices and output devices that are operatively coupled to the processor, as well as an external communication interface for communicating with other computing devices or other entities.
  • the computing device can provide remote communication capabilities to a network. Examples of these remote communication capabilities include using a mobile phone (wireless) network, wireless data network (e.g. 3G, 4G or similar networks), Wi-Fi, Bluetooth, Bluetooth Low Energy (BLE), Wi-Max, or any other communication medium that may provide access to a network such as the Internet or a private network.
  • Some exemplary types of computing device may include a mobile device, a cardholder device, a user device, a consumer device, a server computing device, an appliance, and any computer.
  • mobile devices include mobile phones (e.g., cellular phones), keychain devices, personal digital assistants (PDAs), pagers, notebooks, laptops, notepads, net books, tablet computers, wearable devices (e.g., smart watches, fitness bands, jewelry, etc.), automobiles or motorcycles with remote
  • A“server computer” may include any suitable computer that can provide communications to other computers and receive communications from other computers.
  • a server computer may include a computer or cluster of computers.
  • a server computer can be a mainframe, a minicomputer cluster, or a group of servers functioning as a unit.
  • a server computer may be a database server coupled to a Web server.
  • a server computer may be coupled to a database and may include any hardware, software, other logic, or combination of the preceding for servicing the requests from one or more client computers.
  • a server computer may comprise one or more computational apparatuses and may use any of a variety of computing structures, arrangements, and compilations for servicing the requests from one or more client computers. Data transfer and other communications between components such as computers may occur via any suitable wired or wireless network, such as the Internet or private networks.
  • Messages communicated between any of the computers, networks, and devices described herein may be transmitted using a secure communications protocols such as, but not limited to, File Transfer Protocol (FTP); HyperText Transfer Protocol (HTTP); Secure Hypertext Transfer Protocol (HTTPS), Secure Socket Layer (SSL), ISO (e.g., ISO 8583) and/or the like.
  • FTP File Transfer Protocol
  • HTTP HyperText Transfer Protocol
  • HTTPS Secure Hypertext Transfer Protocol
  • SSL Secure Socket Layer
  • ISO ISO
  • FIG. 1 shows a system diagram of an authentication hub 1 10 in communication with client devices 120, data processing servers 130, and resource managemnet computers 140, in accordance with some embodiments.
  • the client devices 120 can include any device that requests access to a resource being managed by one of the resource management computers 140.
  • a client device could be a point of sale terminal 121 , a personal computer 122, a mobile device 123, a wearable device 124, a smart card 125 (e.g., a biometric card or payment card), or a vehicle 126.
  • Each of the client devices 120 can communicate with the authentication hub over a first network 152.
  • the client devices 120 may communicate with the network 152 using a wired network connection (e.g., ethernet) or a wireless network connection (e.g... Wi-Fi, cellular, or near field communications).
  • a wired network connection e.g., ethernet
  • a wireless network connection e.g... Wi-Fi, cellular, or
  • the client devices 120 can send access requests that include different types of authentication information and that are formatted differently.
  • the authentication hub 1 10 can include an automated client interface automatically adapts the access requests for processing.
  • the client interface can be used for receiving access requests from the client devices 120 and for sending acess responses to the client devices 120 over the first network 152.
  • the authentication hub 1 10 can also communicate with a pluality of data processing servers 130.
  • Each of the data processing servers 130 may be capable of processing different types of authentication information.
  • a first data processing server 131 can evaluate one or more hardware identifiers of a client device in order to determine whether a particular client device is a security risk.
  • a second data processing server 132 can determine use the network identifier (e.g., IP Address) of the client device to determine whether a particular client device is a security risk.
  • a third data processing server 133 can analyze biometric data (e.g., a finger print scan or a retina scan) of a user of a client device to determine whether it is associated with a registered user.
  • a fourth data processing server 134 can analyze personal information of the user to determine whether it matches stored account information.
  • the four data processing servers 130 described above are merely examples of the various data processing servers that could be in communication with the authentication hub 1 10.
  • the authentication hub 140 may communicate with other data processing servers to process other types of authenticaiton information.
  • the authentication hub 1 10 can include an automated client interface which automatically adapts the access requests for processing.
  • the client interface can be used for receiving access requests from the client devices 120 and for sending acess responses to the client devices 120 over the first network 152.
  • the authentication hub 1 10 can provide a data processor interface for
  • the data processor interface can be used for making authentication requests to the data processing servers 130 and receiving authentication responses from the data processing servers 130 over the second network 153.
  • the authentication hub 1 10 can also communicate with a pluality of resource management computers 140.
  • Each of the resouce management computers may manage a different type of resource.
  • a first resource management computer 141 may manage user accounts for a website
  • a second resource management computer 142 can manage academic resources for a school district
  • a third resource management computer 143 can manage payment accounts and provide authorization of payment transactions.
  • the three resource management computers 140 described above are merely examples of the various data processing servers that could be in communication with the authentication hub 1 10.
  • the authentication hub 140 may communicate with other data processing servers to process other types of authentication information.
  • the authentication hub 1 10 can provide a resource manager interface for communicating with the resource management computers 140 over a third network 154.
  • the resource management interface can be used for sending access requests to the resource management computers 140 and receiving access responses from the resource
  • FIG. 2 shows a functional block diagram of an authentication hub 210, in accordance with some embodiments.
  • the authentication hub 210 can perform a variety of functions in order to orchestrate communications between a variety of different client devices, data processing servers, and resource management computers while maintaining the security of authentication information.
  • these functions may be implemented as hardware components of the authentication hub. In other embodiments, these functions may be implemented as software modules on the
  • the authentication hub 210 can include a plurality of computers coupled together in a system, where one or more of the computers performs the different functions of the authentication hub 210. As further described below, the authentication hub 210 can perform access request processing 220, authentication request processing 220, and dynamic message routing 240.
  • the authentication hub 210 can perform automated access request processing 220 upon receiving access requests from the variety of different client devices.
  • automated access request processing 220 functionality includes automated client interface adaptation 221 , access request profiling 222, and community prediction 223.
  • client devices and resource management computers communicate using an application programming interface (API) that may have been specifically designed for that particular resource management computer.
  • API application programming interface
  • the authentication hub 210 can become a centralized point for communications between the client devices and the resource management computers using a variety of different APIs or communication protocols. Accordingly, the authentication hub 210 can be capable of receiving and processing access requests from a variety of different client devices, including access requests in different formats and including different types of authentication information.
  • the authentication hub 210 performs automated client interface adaptation 221 in order to handle access requests having new or different formats. That is, instead of being specifically programmed to handle each of the variety of different APIs used by the different resource management computers for communication with the client devices, the
  • authentication hub 210 can automatically adapt the format of incoming access request messages to be compatible with the APIs used by the authentication hub 210. As such, the authentication hub 210 can receive and process access requests that are different formats, including those that are in formats that have never been received or processed before.
  • the authentication hub 221 can perform automated client interface adaptation 221.
  • the authentication hub 210 can analyze the format of the incoming access request from the client device and then map it to a known API (e.g., a known API used by one of the resource management computers). For instance, the authentication hub 210 can use a linguistic parsing algorithm to build data structures that represent the API of a particular access request message. In some instances, the authentication hub 210 may first perform a lexical analysis of the access request message to create a sequence of symbols or tokens corresponding to the format and information included in that access request message.
  • the symbols/tokens can correspond to the data fields of the access request (e.g., routing information, authentication information, metadata, etc.). Then the authentication hub 210 can perform syntactic analysis on the access request (e.g., on the tokens if lexical analysis was performed) to create a data structure (e.g., a parse tree or a syntax tree) that represents the API of the access request. The authentication hub 210 can then store the API data structure for the client device’s API so that it can be used for later access requests using the same API, such that the API data structure does not have to be rebuilt each time.
  • a data structure e.g., a parse tree or a syntax tree
  • the authentication hub 210 can map the API data structure generated by the linguistic parsing algorithm to a stored API data structure (e.g., an API data structured previously generated based on the access request API of a particular resource management computer).
  • the authentication hub 210 can store a plurality of previously generated API data structures based on the access request messages used by different resource management computers.
  • the authentication hub 210 can generate the API data structures for the resource management computers (e.g., when the message format used by a particular resource management computer changes).
  • the authentication hub 210 can determine the best API match for a particular access request by comparison to previously used APIs using fuzzy criteria. Using this process, the authentication hub 210 can determine the format of the access request message and identify the types of authentication information included in it, even if the access request is using on a previously unknown API.
  • the authentication hub can recommend a new communication API to the client device.
  • the authentication hub 210 can also add, or remove, data fields or information to the client’s access request so that it is compatible with the API of a particular resource management computer.
  • the authentication hub 210 can also perform access request profiling 222 to create profiles for incoming access requests.
  • Access request profiling 222 can also include maintaining a profile for a particular type of interaction data.
  • the authentication hub 210 can create and maintain a profile for a particular IP address or for a particular user account.
  • Access request profiling 222 can speed up the authentication process for a particular access request by reducing the number of potential user accounts that would match the incoming request. For example, if the access request includes biometric data, voice data, or image data as authentication information, but does not include an account number or user identifier, then the authentication information for the entire set of registered user accounts may need to be checked in order to determine if the received authentication information is valid.
  • the profile for the incoming access request can be applied to a predictive model to identify a subset of user profiles within communities similar to the profile of the incoming access request.
  • the subset of user profiles may have the same IP address as the client device or IP addresses in the same community as the IP address of the client device.
  • the communities may also be based on geo-location, time of date, or a network route or path.
  • the received authentication information may be validated based on comparisons to authentication information for a small subset of registered user accounts, instead of all of the registered user accounts.
  • Community prediction is described in further detail herein. Request profiling is further described below.
  • the authentication hub 210 also performs community prediction 223.
  • the authentication hub 210 can use community prediction 223 to determine a set of predicted communities that the access request is predicted to be within using a predictive model based on interaction data (e.g., an IP address) for a plurality of previous access requests.
  • the predictive model can be based on a topological graph of nodes and edges and a plurality of communities including sets of nodes within the topological graph.
  • the predictive model can be generated using a learning algorithm based on interaction data for a plurality of previous access requests.
  • Determining the set of predicted communities can reduce the amount of time and computing resources used in authenticating the access request.
  • the access request may include biometric data (e.g., finger print scan, etc.), voice data (e.g., an audio recording of a phrase or word), image data (e.g., face scan, retina scan, video recording of a person or gesture) or another type of data for authentication that is validated using similarity thresholds.
  • biometric data e.g., finger print scan, etc.
  • voice data e.g., an audio recording of a phrase or word
  • image data e.g., face scan, retina scan, video recording of a person or gesture
  • the authentication hub 210 may only compare the received authentication information to registered
  • the authentication hub 210 has used community prediction to reduce the amount of potential registered users (e.g., their authentication information) to check from a large number (e.g., all registered users) to a small number (e.g., only the registered users that are in a community similar to the received access request).
  • a large number e.g., all registered users
  • a small number e.g., only the registered users that are in a community similar to the received access request.
  • the authentication hub 210 can begin the authentication process by comparing the received authentication information to registered authentication information falling within the most similar predicted communities, further reducing the time and computing resources spent on performing the authentication process.
  • the similarity metrics for a particular community of the topological graph can be determined based on a vector distance between that particular community and the received access request.
  • the authentication hub 210 can initiate a registration process to register the user that made the access request. If the user indicates that they are registered, the authentication hub 210 can request additional authentication information or identifiers from the client device or the user for authentication.
  • the authentication hub 210 can also use community prediction 223 to determine a risk score for a particular incoming access request.
  • the authentication hub 210 can apply the incoming access request to the predictive model to determine a set of predicted communities for the access request.
  • the authentication hub 210 can also determine an account interaction profile based on an account identifier included in the access request.
  • the account interaction profile can be based on interaction data for previous access requests that included the account identifier.
  • the authentication hub 210 can determine an expected subset of the plurality of communities by applying the account interaction profile to the predictive model. The expected communities being expected based on the previous access requests made for the particular account. Then, the authentication hub 210 can determine a risk score based on a comparison of the expected communities (based on the account interaction profile) to the predicted communities (based on the incoming access request). The comparison can involve determining a vector distance between an expected community and a predicted community. The greater the vector distance between the expected community and the predicted community the higher the risk that the access request is fraudulent (e.g., the risk score can be higher). Community prediction is further described below.
  • the authentication hub 210 can perform automated authentication request processing 230.
  • the authentication hub’s automated authentication request processing 230 functionality includes authentication information validation 233, automated privacy control 234, and automated request modification 235.
  • the authentication hub 210 can perform authentication information validation 233 as part of an authentication process.
  • the authentication hub 210 can store registered authentication information for a plurality of registered users/accounts and can compare authentication information received in an incoming access request to registered authentication information of the registered users/accounts. In some embodiments, then authentication hub 210 can perform authentication information validation on a subset of the authentication information received in the access request while one or more data processing servers are used to validate other authentication information received in the access request.
  • the authentication hub 210 can perform automated privacy control 234 to prevent excessive amounts of sensitive authentication information from being distributed to data processing servers or other third parties. By restricting the type and amount of sensitive information used for authentication, the authentication hub 210 can reduce the risk of such information being intercepted or leaked (e.g., due to a security breach at one of the data processing servers).
  • the authentication hub 210 can determine that more, or less, authentication information is required to authenticate a client device depending on various factors. For example, the authentication hub 210 can determine that less authentication information is required in order to authenticate a client device having a higher trust level compared to a client device having a lower trust level. In addition, the authentication hub 210 can determine that more authentication information is required to authenticate a client device that is requesting resources having a higher resource security level (e.g., a greater amount of resources or a more sensitive type of resource) compared to one requesting resources having a lower security level (e.g., fewer resources or a less sensitive type of resource). In another example, the authentication hub 210 can require more authentication information when a determined threat level within the network is higher, compared to when the determined threat level within the network is lower.
  • resource security level e.g., a greater amount of resources or a more sensitive type of resource
  • the authentication hub 210 can also assign weights to different types of authentication information such that it has more or less authentication information is needed to validate the client device depending on what type of authentication information is available. In one example, the authentication hub 210 may determine that a certain authentication level is sufficient to authenticate the client device for a particular access request. In this example, the authentication hub 210 may determine that validating biometric information of the user of the client device would meet or exceed the authentication level.
  • the authentication hub 210 may also determine that validating both a network address of the client device and a hardware identifier of the client device would meet or exceed the authentication level. Thus, even if the client device does not provide biometric information, the authentication hub 210 can authenticate the client device as long as its network address and hardware identifier are provided.
  • the authentication hub 210 can determine whether the client device should be authenticated using the biometric information, or using a network address and hardware identifier instead, based on the sensitivity levels of the different sets of authentication information. For example, the authentication hub 210 can determine that the client device should be authenticated using the network address and hardware identifier, instead of using the biometric information, based on the biometric information having a higher sensitivity level than the sensitivity level of the network address and the hardware identifier. As such, less sensitive information can be used for authentication if it is available and would meet the authentication level determined by the authentication hub.
  • the authentication hub 210 also performs automated request modification 235.
  • the authentication hub 210 can append additional information, stored at the authentication hub 210, to the authentication request.
  • the additional information may enable a particular data processing server to be capable of handling the authentication request. For example, if the authentication hub 210 has stored a hardware identifier for a particular client device from past access requests, and the data processing server would use the hardware identifier for authentication, then the authentication hub 210 can add the hardware identifier to the authentication request sent to the data processing server, even if the client device did not include the hardware identifier in the access request that is currently being processed.
  • the authentication hub 210 can generate an authentication request message including that set of authentication information.
  • the authentication hub 210 may not rely on data processing servers for performing authentication for a particular access request.
  • the authentication hub 210 may perform authentication of the access request and may not send an authentication request to a data processing server. If an authentication request message for a data processing server is generated, the authentication hub 210 can perform dynamic message routing 240.
  • the dynamic message routing 240 process can include automated data process interface adaptation 246 and data processor evaluation 247.
  • the authentication hub 210 can perform automated data processor interface adaptation 246 for communications for data processing server using processes similar to those used in the automated client interface adaptation 221 process discussed above with respect to communications from client devices. That is, the authentication hub 210 can generate API structures (e.g., using a linguistic parser) for each of the data processing servers. Then, the authentication hub 210 can modify an authentication request to match the API for a given data processing server using its API structure (e.g., the API structure determined by the authentication hub 210 for that particular data processing server). Thus, the authentication hub 210 can adapt the authentication requests to match the API protocol of the data processing server that they are being sent to.
  • API structures e.g., using a linguistic parser
  • the authentication hub 210 can also perform data processor evaluation 248.
  • the authentication hub 210 can evaluate the capabilities, authentication information requirements, exposure level, network condition (e.g., the response time between sending an authentication request by the authentication hub and the receiving of an authentication response from the data processing server), stability, accuracy, of each data processing server.
  • the authentication hub 210 may evaluate the data processing servers prior to receiving an access request such that the evaluation does not slow down the processing of the access request.
  • the authentication hub 210 can use information from this evaluation in determining which data processing server to route an authentication request message to. For example, several data processing servers may be capable of validating a particular type of authentication information but each of the data processing servers may have different evaluated response times, stability levels, etc.
  • the authentication hub 210 can use an Al to select a particular data processing server, based on the evaluated criteria, to send the authentication request to.
  • the authentication hub 210 can also evaluate the authentication responses received from the data processing servers. If the message received from the data processing server is suspicious (e.g., the formatting has changed compared to previously received responses from that same data processing server), then the
  • authentication hub can determine to re-route authentication requests originally destined for that particular data processing server to different data processing servers that are capable of handling those authentication requests.
  • FIG. 3 shows an information flow diagram of an authentication process using community prediction, in accordance with some embodiments.
  • the authentication process can be performed by the authentication hub or the data security hub described herein.
  • the authentication hub can store interaction data 302 for a plurality of previous access requests.
  • the interaction data 302 can include a geolocation of the client device, a time of day of the access request (e.g., a specific time or a time range), a date of the access request, an identifier of a resource, a resource amount, an identifier of the resource manager of the resource, an identifier of a resource management computer of the resource manager, an IP address of the client device, and a network route or path used for sending the access request to the data security hub.
  • the authentication hub can use a learning algorithm 304 (e.g., a clustering algorithm) to create a topological graph 306 based on the interaction data 302.
  • the topological graph 306 includes nodes and edges connecting the nodes. Each node in the topological graph 306 can correspond to a certain type of data within the interaction data 302.
  • the learning algorithm 304 may determine an edge-weight for each of the edges in the topological graph 304 and a node-weight of each of the nodes in the topological graph 304.
  • the node-weight for a particular node can be normalized based on the maximum node-weight corresponding to its particular node type.
  • the learning algorithm 304 can then determine a plurality of communities 308 from the topological graph 308 to form a predictive model.
  • Each community of the plurality of communities 308 can include a subset of the nodes.
  • the nodes that are included in a community by the learning algorithm 304 can be based on a vector distance between the nodes.
  • Each of the communities 308 can be associated with a vector for that community based on the vectors of the nodes within that community.
  • the topological graph 306, the plurality of communities 308 and the corresponding predictive model can be created by the authentication hub at routine time intervals (e.g., every 6 months) based on a new set of interaction data 302.
  • One the predictive model is created it can be used processing access requests received in real-time.
  • the authentication hub can receive an incoming access request 312.
  • the authentication hub can determine a set of predicted communities 314 for the incoming access request 312 based on a vector distance between a vector based on the request interaction data and the vectors associated with the plurality of communities 308.
  • the authentication hub After determining the predicted subset of the plurality of communities 314, the authentication hub an initiate an authentication process for the access request 312 using the predicted subset of the plurality of communities.
  • the authentication hub can obtain client authentication information 322 from the access request 312.
  • the authentication hub can also determine a subset of registered authentication information corresponding to account identifiers included in the predicted subset of the plurality of communities.
  • the authentication process can be sped up by only comparing the client authentication information 322 to a subset of the registered authentication information 324 (instead of the entire set of registered authentication information). This process is useful when the access request includes biometric data, audio data, or image data as authentication information but does not include an identifier of a particular account or user.
  • the authentication hub can determine an authentication outcome 326 (e.g., whether the access request has been authenticated) based on whether the client authentication information 322 matches registered
  • the authentication hub can obtain an account identifier from the access request 312. Then, the authentication hub can determine an account interaction profile based on previous access requests associated the account identifier. The previous access requests for that account can be obtain from historical account information 332, which may be stored at the authentication hub. The data security hub can then determine an expected subset of the plurality of communities based on the account interaction profile 334. The authentication hub can then determine a risk score 3336 for the access request based on a comparison of the expected subset of the plurality of communities to the predicted subset of the plurality of communities. The comparison of the expected subset to the predicted subset may be based on a vector distance between the two. The
  • FIG. 4 shows an exemplary topological graph 400 and communities, in accordance with some embodiments.
  • the authentication hub can store interaction data for a plurality of previous access requests.
  • the interaction data can include information about the geolocation, date, and time of day that the previous access requests were made as well as the network address and network route used to send the access request.
  • the interaction data can also include an account identifier, an identifier the resource manager that the request is for, an identifier of a type of resource manager (“R.M. type”), an identifier of the resource, an amount of the resource (e.g., specific amount of a range of amounts).
  • the interaction data may be determined from the plurality of previous access request messages.
  • the authentication hub may create a topological graph, such as the topological graph 400 of FIG. 4, based on the interaction data.
  • the graph may be a topological graph comprising nodes (represented as circles in FIG. 4) and edges
  • a node for each distinct information element amongst the plurality of received interaction data may be generated and plotted on the topological graph. For example, a node for each account identifier, resource manager identifier, resource manager type (“R.M. type” in FIG. 4), IP Address, resource amount, etc. may be generated and plotted.
  • the different types may be predetermined based on the different types of interaction data stored.
  • the plotted nodes may be connected to one another via edges that represent the relationship/linkage between nodes.
  • Each edge may be associated with a weight quantifying the interaction between the two nodes of the edge. For example, a node for an account identifier of a specific user may be connected to a node for a resource manager identifier of a specific resource manager.
  • the weight of the connecting edge between the two nodes may reflect a quantity of interactions (access requests) between the specific resource manager and user. For example, the user may have conducted 5 access requests at the resource manager (based on the stored interaction data), which may result in the edge between the nodes having a weight of 5.
  • the account identifier node and resource manager identifier node may be related to other distinct information elements included in the access request.
  • the node for the resource manager identifier may be connected to a node for a resource manager type relating to the resource manager (e.g. 5651 - Family Clothing Stores).
  • the node for the account identifier may be connected to a node for a geolocation (e.g., zip code) at which the access request may have been initiated (e.g. 941 10 - San Francisco).
  • the nodes may represent data that are distinct from each other, and/or the nodes may characterize underlying data at specific levels of generality (e.g., a resource manager ID may represent a specific resource manager, but an resource manager type may represent a category that the specific resource manager is part of).
  • the authentication hub may also determine and assign edge-weights to each edge may of the topological graph based on the interaction data.
  • the edge-weights may be related to vector distances between nodes, as the position of two nodes relative to one another can be expressed as vector in which edges between nodes have a specific length quantifying their relationship.
  • the relationship between two nodes can either be measured as a weight in which higher correlations are given by higher weights, or, the relationship can be measured as a distance, in which higher correlations are given by shorter distances.
  • highly connected nodes that interact frequently with each other may be densely populated in the graph (i.e. close to one another within a distinct region of the graph).
  • the length of an edge can be inversely proportional to its edge- weight.
  • a path in the topological graph 400 may be defined as one or more edges that can be traversed to move from a first node to a second node.
  • the length of a path may be determined to be the total length of the edges included in the path.
  • a path may comprise edges [a,b], [b,c], and [c,d], which may each comprise edge lengths of 2, 3, and 4, respectively.
  • the length of a path may be determined to be the number of edges along the path.
  • a path from node a to d may comprise three edges: [a,b], [b,c], and [c,d], and the length of the path may then be equal to‘3.’
  • the authentication hub may then determine a plurality of communities from the created topological graph.
  • the communities may be groups of nodes that are highly connected (as given by greater weights and shorter distances), indicating that they have a high probability of interacting with one another based on the interaction data. These communities can then be used to classify incoming data (e.g., an incoming access request) and generate a predicted outcome (e.g. predicted interaction) for the incoming data.
  • the communities can be generated using an unsupervised learning algorithm that clusters nodes into distinct and densely populated groups (e.g., the IPCA clustering algorithm).
  • the learning algorithm can be an algorithm that generates communities that overlap (i.e. where nodes can belong to more than one community).
  • Community A includes nodes that are also included in Community B (e.g., the node for“IP Address: 23.0.0.108” and the node for“Resource Provider ID: 5”).
  • the graph learning algorithm for mapping groups of nodes in an interaction network can be used, as described in further detail below.
  • the authentication hub may be accumulated and recorded in a graph database to form a predictive model.
  • each community can be identified by a unique community ID, which may further be associated with unique identifiers for each node included in the community.
  • the community information may be stored in a graph database, and at a later point in time, data received in a request for a prediction can then be compared to the determined communities to classify the data and generate the requested prediction.
  • the authentication hub can use the predictive model to determine predicted communities for incoming access requests received in realtime.
  • the authentication hub can determine interaction data for the incoming access request based on the data included in the access request and information gathered by the authentication hub at the time the access request is received (e.g., date, time of day, network address, network route, etc.).
  • the interaction data for the access request can then be compared to the predictive model to determine a set of predicted communities.
  • the set of predicted communities being a subset of the plurality of communities in the topological graph.
  • the interaction data for a particular incoming access request may comprise a location and a resource manager identifier.
  • the authentication hub can apply the interaction data for the access request to the predictive model to identify a set of predicted communities that were formed based on interaction data similar to the interaction data in the incoming access request. For example, an incoming access request having interaction data indicating a location of“941 10,” and a resource manager identifier of“3” may be determined to fall within“Community A” of FIG. 4 since“Community A” includes nodes for the location of“94110,” and the resource manager identifier of“3.” Thus, the authentication hub can determine that Community A is a predicted community for the incoming access request.
  • a plurality of predicted communities can be also be determined based on a vector distance between the access request and the communities of the predictive model.
  • the authentication hub can vectorize the interaction data for access request and the community structure (e.g., the nodes and edges) and determine a vector distance between interaction data for access request and the community structure.
  • a particular community may be determined to be a predicted community based on the vector distance being below a similarity threshold, where a lower similarity threshold would result in fewer predicted communities and a higher similarity threshold would result in more predicted communities.
  • the“distance” between two nodes can be related to an edge weight of the edge connecting the two nodes.
  • Nodes that frequently interact and/or have a strong level of correlation to one another may be connected by highly weighted/strong edges.
  • a node for a resource manager that is busiest during morning hours may be a connected to a node for time“09:00-1 1 :00” by a strong edge of weight 20, but may be connected to a node for time“18:00-22:00” by a weak edge of weight 1.
  • a node for a resource manager that managers expensive resources may be connected to a node for a resource amount of“100-500” by a strong edge with high weight, and may be connected to a node for a resource amount of“30-50” by a weak edge of low weight.
  • the topological graph can includes communities of nodes that are closely connected (e.g., the distance between the nodes is smaller compared to nodes in others communities).
  • nodes of the graph, G can be grouped into communities, K.
  • Each distinct community, K may comprise densely populated nodes that interact more frequently with one another than with nodes of a different community.
  • a community may have a diameter, which may describe the scope of the community.
  • the diameter of a community may be denoted as, SP(K), and may be defined to be the largest length of any shortest path between any two nodes in K.
  • a community, K may comprise nodes A, B, C, and D, and the shortest possible path between each pair of nodes may be‘AB: 1’,‘AC: 3’,‘AD: 4’,‘BC: 2’,‘BD: 3,’ and‘CD: 1.’
  • the diameter, SP(K) would be equal to 4, as the longest shortest path is‘AD:4.’
  • the diameter of a community, K can be denoted by ASP(K), and may defined to be the average length of all the shortest paths between each pair of nodes in K.
  • a community comprising nodes‘A,’‘B,’‘C’, and‘D’ may have shortest paths‘AB: T,‘AC: 3’,‘AD: 4’,‘BC: 2’,‘BD: 3,’ and‘CD: 1.’
  • m v K is the number of edges shared between the node v and the nodes included in K
  • n K is the number of nodes included in K
  • each edge in the graph may be assigned a weight.
  • a methodology similar to the IPCA clustering algorithm may be used to form communities.
  • the weight assigned to an edge between nodes u and v, [u, v] may be defined as the number of neighbors (adjacent nodes) shared by the nodes u and v. For example, node u may be connected to nodes a, b, c, x, y, and z. Meanwhile, node v may be connected to nodes x, y, and z.
  • the weight assigned to the edge [u,v] may be 3, as nodes u and v share three neighbors (nodes x, y, and z).
  • the weight of each edge may be computed based on a quantity of interactions comprising the two nodes connected by the edge. For example, the weight of an edge between nodes for an IP address and a resource manager may be 5, which may represent 5 access requests made by a client device using the IP address at the resource manager.
  • the weight of an edge between a node for a resource manager and a node for resource manager type which may be included in every access request conducted at the resource manager, may have a weight of 100.
  • an edge connecting the resource manager node to a node for an hour of operation at which 10% of the resource manager’s access requests occur may have a weight of 10.
  • a node-weight for each node in the topological graph may also be computed.
  • the nodes may be sorted in decreasing order by weight, and stored in a queue, S q .
  • the authentication hub can normalize the node-weight for each node in the topological graph based on a maximum node-weight for its particular type of node. For example, if the topological graph includes 4 nodes having a node type of“IP address” and these 4 nodes have respective node-weights of 25, 21 , 16, and 22, then the authentication hub can determine that the maximum node weight for the nodes of type“IP address” is“25.” The authentication hub can then normalize the node-weights for each node of type“IP address” using the maximum node-weight of“25.” By normalizing the nodeweights for each node type, the topological graph can include a plurality of different node types while maintaining consistency of vector distance between the nodes.
  • Each community that is to be created may originate from a seed node.
  • the seed node may serve as a first node in a community that is being generated, and the community may be further built by extending the community from the first node based on whether or not nearby nodes meet predefined criteria. The predefined criteria for adding nearby nodes is further described below.
  • the highest weighted nodes in the queue S q may be selected as the seed nodes of each community.
  • the first node (i.e. highest weight node) in the queue S q may be selected a seed node to grow a new community.
  • a new community may be built from a seed node by extending the community K to include nearby nodes (neighbors) that are connected to one or more nodes included in the community.
  • the new community K may be extended by adding nodes recursively from its neighbors according to priority.
  • the priority of a neighbor v of K may be determined by the value IN V K, the interaction probability between v and the nodes of the new community K.
  • the node with the highest interaction probability against K may be selected as the neighboring node with the highest priority.
  • whether a high priority neighboring node v is added to the new community is determined by an Extend-judgment test that tests if v is a (K, T in , d)-vertex.
  • a candidate node v may be added to the new community if the candidate node v is a (K, T in , d)-vertex.
  • the community may be updated, i.e., the neighbors of the new community may be re-constructed from the graph, G, and the priorities of the neighbors of the new community may be re-calculated.
  • whether or not a candidate node (neighboring node) v is added to a community K may be determined by two conditions. First, interaction probability IN V K, of the candidate node against the community may be calculated In an embodiment, the candidate node will not be added to the community if the value IN V K is less than a predetermined threshold, T in .
  • the predetermined threshold, T in may be a predetermined number between 0 and 1.
  • the predetermined threshold, Ti n may be chosen to control the number of nodes included in each community as well as the total number of communities generated. For example, a greater T in value may result in a greater number of communities as well as fewer nodes in each community.
  • a lower T in value may result in fewer communities, with each community comprising a greater number of nodes. This may further affect the outputted predictions of the model, as a model with more communities may have greater resolution and may result in more precise predictions (e.g. fewer false positives); however, a model comprising communities that include a large number of nodes may be capable of predicting interactions that would have otherwise been missed had the communities been any smaller (e.g. interactions with lower probability that can nevertheless occur). Accordingly, a T in value may be selected based on the balance between these outcomes, and may be adjusted for desired results.
  • the diameter of the extended community K + v may be calculated.
  • the diameter of a community can be calculated as the largest length (i.e. maximum possible length) of any shortest path between any two nodes in the community, SP(K), or can be calculated as the average length of all shortest possible paths between each pair of nodes in the community, ASP(K).
  • the diameter of the graph K + v may be calculated and compared to a parameter d, which may be a pre-established boundary for communities that are being built. If the computed value of the diameter of K + v is bounded by d, then the vertex v may be added to the community (i.e.
  • the parameter d may be set based on the nature of the interaction data that is being used. For example, it may be determined that for an interaction network of users and resource managers, 95% of interactions occur between users and resource managers that are only 5 or fewer connections away from each other, and the parameter d may be set as‘5. If the node v fails to meet either of the predefined criteria, then the next highest priority neighbor of the community is tested, and so on. Once all remaining neighbors of the community fail to meet the predefined criteria, then the community cannot be further extended, and the nodes of community K may be completely determined. Then, the nodes included in community K, as completely built, may be removed the queue S q before selecting the next node in the queue.
  • this approach may generate overlapping communities, as the nodes of the generated communities are only removed from the queue Sq, but not from the original graph G from which candidate nodes are selected from during the extending community process. Furthermore, the process may guarantee that no two generated communities would be same, as the seed node for a new community may be selected such that the seed node does not belong to any of the previously constructed communities.
  • the technical advantages of the above mentioned features include the expression of multiple traits of any given node when making a prediction. This allows for more accurate predictions that can be tailored to specific locations, time of days, etc., and thus can account for a large range of qualities of any given node.
  • prior methods for predicting interactions could only classify nodes into a single community, whereas the currently presented method accounts for nodes belonging to multiple communities. This may be beneficial, for example, when predicting interactions between users and resource managers that belong to more than one community, and whose interactions vary as conditions change.
  • the method allows for the mapping of interactions at multiple levels. This is of particular use for predicting interactions between users and resource managers, as correlations between users and resource managers are expressed in each community (i.e. by account number and name) as well as non-intuitive correlations between concepts relating to the users and resource managers, such as location and MCC code. Even further, correlations between concepts relating to interactions themselves may be expressed, such as the time and nature of the interaction that occurred (e.g. as expressed by resource amount and by the means by which an access request was made).
  • FIG. 5 shows a flow chart of an exemplary method for processing access request messages through a data security hub, in accordance with some embodiments.
  • the data security hub or authentication hub described above can perform the method for processing access request messages
  • the data security hub can create a topological graph based on the interaction data for a plurality of previous access requests.
  • the interaction data can be stored at the data security hub.
  • the topological graph includes nodes and edges connecting the nodes. Each node in the topological graph can correspond to a certain type of interaction data.
  • the types of interaction data can include a geolocation of the client device, a time of day of the access request (e.g., a specific time or a time range), a date of the access request, an identifier of a resource, a resource amount, an identifier of the resource manager of the resource, an identifier of a resource management computer of the resource manager, an IP address of the client device, and a network route or path used for sending the access request to the data security hub.
  • the data security hub may also determine an edge-weight for each of the edges in the topological graph.
  • the data security hub may also determine a node-weight of each of the nodes in the topological graph based on the edge- weights.
  • the data security hub may also determine a maximum node-weight corresponding to each of a plurality of node types based on the node-weight of each of the nodes.
  • Each of the nodes may be associated with a particular node type.
  • the node type can correspond to the type of interaction data that the node represents in the topological graph.
  • the data security hub may also normalizing the node-weight of each of the nodes in the topological graph based on the maximum node-weight corresponding to its particular node type. In some embodiments, when the data security hub determines the plurality of communities from the topological graph to form the predictive model, the determination is based on the normalized node-weights.
  • the data security hub can determine a plurality of communities from the topological graph to form a predictive model.
  • the plurality of communities can be determined using a clustering algorithm as discussed above.
  • Each community of the plurality of communities can include a subset of the nodes.
  • the nodes included in a community can be based on a distance between the nodes.
  • the data security hub In determining the plurality of communities from the topological graph, the data security hub generate a queue including the nodes in decreasing order by node-weight. The data security hub may also select a first seed node from the queue, where the first seed node has the highest node-weight of the nodes in the queue. The data security hub may also create a first community that includes the first seed node. The data security hub may also calculate an interaction probability for each of a plurality of candidate nodes that are not included in the first community. The interaction probability can be the probability of a candidate node interacting with a community node that is included in the first community.
  • the data security hub may also determine a highest priority candidate node based on the interaction probability for each of the plurality of candidate nodes that are not included in the first community. The data security hub may also determine whether the highest priority candidate node meets predefined criteria. The data security hub may also add the highest priority candidate node to the first community based on the determination of whether the highest priority candidate node meets the predefined criteria. The data security hub may also determine a next highest priority candidate node based on the interaction probability for each of the plurality of candidate nodes not included in the first community. The data security hub may also determine that the next highest priority candidate node does not meet the predefined criteria. The data security hub may then output the first community.
  • the data security hub may assign a unique community-identifier to the first community comprising and node-identifiers for the set of nodes that are included in the first community. Then, the data security hub may remove, from the queue, the set of nodes included in the first community. After creating the first community, the data security hub can create additional communities using a similar process. For example, the data security hub can select a second seed node from the queue, the queue not including the set of nodes included in the first community, and then create a second community including the second seed node.
  • the plurality of communities of the topological graph include the first community and the second community.
  • the data security hub can receive, from a client device, an access request.
  • the access request can include authentication information and other data.
  • the data security hub can determine request interaction data for the access request.
  • the request interaction data can include a geolocation of the client device, a time of day of the access request (e.g., a specific time or a time range), a date of the access request, an identifier of a resource, a resource amount, an identifier of the resource manager of the resource, an identifier of a resource management computer of the resource manager, an IP address of the client device, and a network route or path used for sending the access request to the data security hub.
  • the data security hub may gather a portion of the request interaction data. For example, the data security hub may perform a trace route procedure to determine the network route used to send the access request.
  • the data security hub can determine a predicted subset of the plurality of communities by applying the request interaction data to the predictive model. Applying the request interaction data to the predictive model can include creating a topological graph structure based on the request interaction data, vectorizing the request interaction data, vectorizing the plurality of communities in the topological graph, and determining a vector distance between a request vector and a vector for each community of the plurality of communities. communities having the shortest vector distance (e.g., below a similarity threshold) may be selected to be included in the predicted subset of the plurality of communities.
  • the data security hub can determine a network address of the client device and the predictive model can be based on the network addresses used for sending the plurality of previous access requests.
  • the data security hub can initiate an authentication process for the access request using the predicted subset of the plurality of communities.
  • the data security hub can obtain client authentication information from the access request.
  • the data security hub may also determine a subset of registered authentication information corresponding to account identifiers included in the predicted subset of the plurality of communities.
  • the client authentication information includes biometric data of a user of the client device and the subset of registered authentication information includes biometric data of registered users.
  • the authentication process can further include comparing the client authentication information to the subset of registered authentication information.
  • the data security hub may then determine an authentication outcome based on the comparing of the client authentication information to the subset of registered authentication information. An authentication response sent to the client device can be based on the authentication outcome.
  • the data security hub can determine that the client authentication information is not registered based on the comparison of the client authentication information to the subset of registered authentication information. Then the data security hub can initiate a registration process based on the determination that the client authentication information is not registered.
  • the data security hub may obtain an account identifier from the access request. Then, the data security hub can determine an account interaction profile based on previous access requests associated the account identifier. The data security hub can determine an expected subset of the plurality of communities based on the account interaction profile. The data security hub can then determine a risk score for the access request based on the comparison of the expected subset of the plurality of communities to the predicted subset of the plurality of communities. An authentication response sent to the client device can be based on the risk score. The data security hub may also determine a vector for each of the plurality of communities based on nodes and edges of the topological graph corresponding to each respective community. The data security hub may also determine an expected vector based on the account interaction profile.
  • the data security hub may then determine a vector distance between the expected vector and the vector representing each of the plurality of communities.
  • the determination of the expected subset of the plurality of communities by the data security hub can be based on the vector distances between the expected vector and the vector representing each of the plurality of communities.
  • the data security hub can restricting the authentication information of the access request based on the risk score to obtain a restricted set of authentication information.
  • the data security hub may also send the restricted set of authentication information to a data processing server.
  • the data processing server may receive a second authentication response from the data processing server.
  • the authentication response provided by the data security hub can be based on the second authentication response from the data processing server.
  • the data security hub can provide, to the client device, an authentication response based on the authentication process for the access request.
  • the authentication response can indicate whether the access request was authentication or not.
  • Such subsystems or components are interconnected via a system bus.
  • Subsystems may include a printer, keyboard, fixed disk (or other memory comprising computer readable media), monitor, which is coupled to display adapter, and others.
  • Peripherals and input/output (I/O) devices which couple to an I/O controller (which can be a processor or other suitable controller), can be connected to the computer system by any number of means known in the art, such as a serial port.
  • a serial port or an external interface can be used to connect the computer apparatus to a wide area network such as the Internet, a mouse input device, or a scanner.
  • the interconnection via the system bus allows the central processor to communicate with each subsystem and to control the execution of instructions from system memory or the fixed disk, as well as the exchange of information between subsystems.
  • the system memory and/or the fixed disk may embody a computer readable medium.
  • the embodiments may involve implementing one or more functions, processes, operations or method steps.
  • the functions, processes, operations or method steps may be implemented as a result of the execution of a set of instructions or software code by a suitably-programmed computing device, microprocessor, data processor, or the like.
  • the set of instructions or software code may be stored in a memory or other form of data storage element which is accessed by the computing device, microprocessor, etc.
  • the functions, processes, operations or method steps may be implemented by firmware or a dedicated processor, integrated circuit, etc.
  • any of the embodiments can be implemented in the form of control logic using hardware (e.g. an application specific integrated circuit or field programmable gate array) and/or using computer software with a generally programmable processor in a modular or integrated manner.
  • a processor includes a singlecore processor, multi-core processor on a same integrated chip, or multiple processing units on a single circuit board or networked. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will know and appreciate other ways and/or methods to implement embodiments of the present invention using hardware and a combination of hardware and software.
  • Any of the software components or functions described in this application may be implemented as software code to be executed by a processor using any suitable computer language such as, for example, Java, C, C++, C#, Objective-C, Swift, or scripting language such as Perl or Python using, for example, conventional or object-oriented techniques.
  • the software code may be stored as a series of instructions or commands on a computer readable medium for storage and/or transmission.
  • a suitable non-transitory computer readable medium can include random access memory (RAM), a read only memory (ROM), a magnetic medium such as a hard-drive or a floppy disk, or an optical medium such as a compact disk (CD) or DVD (digital versatile disk), flash memory, and the like.
  • the computer readable medium may be any combination of such storage or transmission devices.
  • Storage media and computer-readable media for containing code, or portions of code can include any appropriate media known or used in the art, including storage media and communication media, such as but not limited to volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage and/or transmission of information such as computer-readable instructions, data structures, program modules, or other data, including RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, data signals, data transmissions, or any other medium which can be used to store or transmit the desired information and which can be accessed by the computer.
  • RAM random access memory
  • ROM read only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory electrically erasable programmable read-only memory
  • CD-ROM compact disc read-only memory
  • DVD digital versatile disk
  • magnetic cassettes magnetic tape
  • magnetic disk storage magnetic disk storage devices
  • data signals
  • Such programs may also be encoded and transmitted using carrier signals adapted for transmission via wired, optical, and/or wireless networks conforming to a variety of protocols, including the Internet.
  • a computer readable medium according to an embodiment of the present invention may be created using a data signal encoded with such programs.
  • Computer readable media encoded with the program code may be packaged with a compatible device or provided separately from other devices (e.g., via Internet download). Any such computer readable medium may reside on or within a single computer product (e.g. a hard drive, a CD, or an entire computer system), and may be present on or within different computer products within a system or network.
  • a computer system may include a monitor, printer, or other suitable display for providing any of the results mentioned herein to a user.
  • any of the methods described herein may be totally or partially performed with a computer system including one or more processors, which can be configured to perform the steps.
  • embodiments can be directed to computer systems configured to perform the steps of any of the methods described herein, potentially with different components performing a respective steps or a respective group of steps.
  • steps of methods herein can be performed at a same time or in a different order. Additionally, portions of these steps may be used with portions of other steps from other methods. Also, all or portions of a step may be optional. Additionally, any of the steps of any of the methods can be performed with modules, units, circuits, or other means for performing these steps.
  • a recitation of“a,”“an” or“the” is intended to mean“one or more” unless specifically indicated to the contrary.
  • the use of“or” is intended to mean an“inclusive or,” and not an“exclusive or” unless specifically indicated to the contrary.
  • the use of the terms “first,”“second,”“third,”“fourth,”“fifth,”“sixth,”“seventh,”“eighth,”“ninth,”“tenth,” and so forth, does not necessary indicate an ordering or a numbering of different elements and may simply be used for naming purposes to clarify distinct elements.
  • the use of“client” computer and“server” computer does not necessary indicate the intended use of the computers, but may simply be used for naming purposes.

Abstract

To speed up the authentication process, a data security hub can create a predictive model using a clustering algorithm based on interaction data for previous access requests. The predictive model can be used to determine predicted communities for an incoming access request. The predicted communities can be used to determine a reduce set of registered authentication information to compare against authentication information in the incoming access request. The predicted communities can also be used to determine a risk score for the access request based on how similar the predicted communities for the incoming access request are to expected communities determined based on past access requests.

Description

DATA SECURITY USING GRAPH COMMUNITIES
BACKGROUND
[0001] Client devices (e.g., personal computers, smartphones, etc.) can be used to access a variety of secure resources such as user accounts, information databases, file storage, website logins, and digital wallets. In order to prevent fraud, resource managers may require authentication information of the user or the client device to be validated before granting access to the resource. For example, a resource manager of an online storage system may require a device’s network address to be validated, in addition to verifying the user’s account name and password.
[0002] While resource security can be improved by validating authentication information, performing the authentication process can be complex and time consuming. There is a need for improved systems and methods for performing authentication.
BRIEF SUMMARY
[0003] Client devices can send access request messages including authentication information to request access to a resource. The resource may be managed by a resource management computer and the authentication information may be validated by an authentication data processing server. A data security hub (e.g., an authentication hub) can provide centralized routing between numerous different client devices, resource
management computers, and authentication data processing servers. The data security hub can determine a risk score for a particular access request and use that risk score in an authentication process. The data security hub can determine the risk score based on whether interaction data for the current access request matches expected interaction data using a community predictive model. For example, the data security hub can determine whether the IP address used to send the authentication request falls within an expected community of IP addresses. The data security hub can also use the community predictive model to determine a subset of registered accounts that a current authentication request might match. For instance, the data security hub can receive an access request including biometric data and can apply interaction data for the access request to the community predictive model to determine a subset of registered accounts that the received biometric data might match. Accordingly, the received biometric data does not need to be compared against the entire set of biometric data for registered accounts (which could potentially lead to a false positive due to the large set for comparison), but only to the reduced set of accounts.
[0004] Some embodiments of the disclosure provide methods for processing access request messages through a data security hub. The method includes storing interaction data for a plurality of previous access requests. The previous access requests can also be stored. The interaction data can include information about the geolocation, date, and time of day that the previous access requests were made as well as the network address and network route used to send the access request. The interaction data can also include data or parameters included in the access request, such as an identifier the resource manager that the request is for, an identifier of the resource, an amount of the resource.
[0005] The method further includes creating a topological graph based on the interaction data. The topological graph including nodes and edges connecting the nodes. The topological graph can be created using an unsupervised clustering algorithm. For example, a first node associated with a particular Internet Protocol (IP) address can be connected by an edge to a second node associated with a geolocation (e.g., a zip code or Global Positioning System coordinates) based on previous access requests made in that geolocation being sent using the particular IP address.
[0006] The method further includes determining a plurality of communities from the topological graph to form a predictive model. Each community of the plurality of communities includes a subset of the nodes in the topological graph. A portion of one community can overlap a portion of another community. That is, the same node can be included in multiple communities. Interaction data for a new access request received by the data security hub in real-time can be applied to the predictive model in order to determine a set of communities that the new access requests fits within. This can be done by vectorizing each of the communities and interaction data for the new access request and then determining whether a vector distance between the vector for the new access request and a vector for a community is within a predetermined range.
[0007] The method further includes receiving, from a client device, an access request.
The access request can be received in real-time by the data security hub. The method includes determining request interaction data for the access request. The request interaction data can indicate one or more of the geolocation of the client device, the time of day of the access request (e.g., a specific time or a time range), a date of the access request, an identifier of a resource, a resource amount, an identifier of the resource manager of the resource or a resource management computer, an IP address of the client device, and a network route or path used for sending the access request to the data security hub.
[0008] The method further includes determining a predicted subset of the plurality of communities by applying the request interaction data to the predictive model. The predicted communities in the subset can be determined by comparing the request interaction data to the communities in the topological graph. For example, an IP address of the client device can be used to predict communities that include nodes associated with that IP address.
[0009] The method further includes initiating an authentication process for the access request using the predicted subset of the plurality of communities. The method further includes providing an authentication response to the client device where the authentication response is based on the authentication process for the access request.
[0010] Other embodiments are directed towards systems for implementing the above method. A better understanding of the nature and advantages of the embodiments may be gained with reference to the following detailed description and the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] FIG. 1 shows a system diagram of an authentication hub in communication with client devices, data processing servers, and resource managemnet computers, in accordance with some embodiments.
[0012] FIG. 2 shows a functional block diagram of an authentication hub, in accordance with some embodiments.
[0013] FIG. 3 shows an information flow diagram of an authentication process using community prediction, in accordance with some embodiments.
[0014] FIG. 4 shows an exemplary topological graph and communities, in accordance with some embodiments.
[0015] FIG. 5 shows a flow chart of an exemplary method for processing access request messages through a data security hub, in accordance with some embodiments. DETAILED DESCRIPTION
[0016] Client devices such as personal computers, smartphones, tablets, and wearable devices can be used to request access to resources. For example, a client device can request access to login to a local or remote user account, gain permissions to files or settings, retrieve or store information in a database, make payment transactions, or gain access to a physical structure. To request access to a particular resource, a client device can send an access request message to the resource management computer associated with the requested resource.
[0017] However, simply verifying that the correct password has been provided may not be sufficient to prevent unauthorized access to resources. For instance, someone besides the original user may obtain physical access to the user’s client device and use stored user names and passwords to fraudulently request access to resources. In other cases, account names and passwords may be leaked by a security breach at the resource manager or another third-party, enabling fraudsters to gain access to the same or other accounts using the leaked passwords. Passwords could also be guessed using a brute-force attack or other cracking technique.
[0018] To prevent fraudulent access, resource management computers may require client devices to provide authentication information of the user or the client device (e.g., personal or sensitive information). Accordingly, client devices can send access request messages including authentication information to request access to a resource. A data security hub (e.g., an authentication hub) can provide centralized routing between numerous different client devices, resource management computers, and authentication data processing servers.
[0019] To speed up the authentication process, the authentication hub can create a predictive model based on interaction data for previous access requests. The predictive model can be used to determine a set of predicted community for an incoming access request. The authentication hub can determine a risk score for the incoming access request based on whether predicted communities based on interaction data for that access request match expected communities based on historical data. For example, the data security hub can determine whether the IP address used to send the authentication request falls within an expected community of IP addresses normally used to request access to a particular resource. Accordingly, the authentication hub may be able to determine that the access request should not be authenticated, even without sending the authentication information to a data processor to be verified.
[0020] The data security hub can also use the community predictive model to determine a subset of registered accounts that a current authentication request might match. For instance, the data security hub can receive an access request including biometric data and can apply interaction data for the access request to the community predictive model to determine a subset of registered accounts that the received biometric data might match. Accordingly, the received biometric data does not need to be compared against the entire set of biometric data for registered accounts (which could potentially lead to a false positive), but only to the reduced set of accounts.
[0021] Thus, the authentication hub can maintain resource security while speeding up the authentication process and using fewer computing resources. These features of the authentication hub and others are described in further detail below.
I. TERMS
[0022] Explanation and description of some of the terms and phrases used in the Detailed Description are provided below.
[0023] A“client device” or“user device” may include any device that can be operated by a user. A client device or user device can provide electronic communication with one or more computers. A communication device can be referred to as a mobile device if the mobile device has the ability to communicate data portably.
[0024] A“mobile device” may comprise any suitable electronic device that may be transported and operated by a user, which may also provide remote communication capabilities over a network. Examples of remote communication capabilities include using a mobile phone (wireless) network, wireless data network (e.g. 3G, 4G or similar networks), Wi-Fi, Wi-Max, or any other communication medium that may provide access to a network such as the Internet or a private network. Examples of mobile devices include mobile phones (e.g. cellular phones), PDAs, tablet computers, net books, laptop computers, personal music players, hand-held specialized readers, etc. Further examples of mobile devices include wearable devices, such as smart watches, fitness bands, ankle bracelets, etc., as well as automobiles with remote communication capabilities. A mobile device may comprise any suitable hardware and software for performing such functions, and may also include multiple devices or components (e.g. when a device has remote access to a network by tethering to another device - i.e. using the other device as a modem - both devices taken together may be considered a single mobile device). A mobile device may further comprise means for determining/generating location data. For example, a mobile device may comprise means for communicating with a global positioning system (e.g. GPS).
[0025] An“application” may be computer code or other data stored on a computer readable medium (e.g. memory element or secure element) that may be executable by a processor to complete a task.
[0026] An“access request message” refers to a message sent by a client device to request access to a resource. The resource may be managed by a particular resource management computers. The client device may send the access request message to an authentication hub, which may authenticate the access request message prior to sending it to the corresponding resource management computer. The access request message can include authentication information that may be validated by either the authentication hub or a data processing server. Different types of client devices may generate access requests according to different APIs or protocols.
[0027] “Authentication information” may be information that can be used to authenticate a user or a client device. That is, the authentication information may be used to verify the identity of the user or the client device. In some embodiments, the user may input the authentication information into a device during an authentication process. Examples of authentication information that can be input by a user of the client device include biometric data (e.g., fingerprint data, facial recognition data, 3-D body structure data, deoxyribonucleic acid (DNA) data, palm print data, hand geometry data, retinal recognition data, iris recognition data, voice recognition data, etc.), passwords, passcodes, personal identifiers (e.g., government issued licenses or identifying documents), personal information (e.g., address, birthdate, mother’s maiden name, or phone number), and other secret information (e.g., answers to security questions). Authentication information can also include data provided by the device itself, such as hardware identifiers (e.g., an International Mobile Equipment Identity (IMEI) number or a serial number), a network address (e.g., internet protocol (IP) address), interaction information, and Global Positioning System (GPS) location information).
[0028] A“limited set of authentication information” or a“restricted set of authentication information” refers to a set of authentication information that has been limited or restricted from the set of authentication information received in an access request. The authentication information may be limited in that certain types of authentication information are removed, portions or a certain piece of authentication information is removed or obscured, or, some or all of a certain piece of authentication information is obfuscated, while still being able to be validated. For example, more sensitive authentication information (e.g., authentication information determined to have a higher sensitivity level) that is included in an access request may not be included in the limited set of authentication information. In another example, more sensitive authentication information may be obfuscated (e.g., using secure multi-party computation techniques).
[0029] A“linguistic parser” refers to an artificial intelligence algorithm for processing natural language to determine the elements, relationships, and grammatical structure of sentences (e.g., which characters are words, whether a word is a noun or verb, or which words is the subject or object of a particular verb). An authentication hub may use a linguistic parsing algorithm to build data structures that represent the API of a particular access request message, similar to how linguistic parsing algorithms may be used to represent the structure of a sentence in a natural language. In parsing an access request message, the authentication hub may first create a sequence of symbols or tokens corresponding to the API, protocol, or format of the access request message. The symbols/tokens can correspond to the data fields of the access request (e.g., routing information, authentication information, metadata, etc.). Then the authentication hub can create a data structure (e.g., a parse tree or a syntax tree) that represents the API of the access request.
[0030] A“resource manager” can be any entity that provides resources. Examples of a resource managers include a website operator, a data storage provider, an internet service provider, a merchant, a bank, a building owner, a governmental entity, etc. Any entity that maintains accounts for users or that can provide information, data, or physical objects to users may be considered a“resource manager.” The resource manager may operate a resource management computer to perform functions for maintaining accounts and controlling access to resources.
[0031] An“access device” may be any suitable device that provides access to a remote system. An access device may also be used for communicating with a resource
management computer, a merchant computer, a transaction processing computer, an authentication computer, or any other suitable system. An access device may generally be located in any suitable location, such as at the location of a merchant. An access device may be in any suitable form. Some examples of access devices include POS or point of sale devices (e.g., POS terminals), cellular phones, PDAs, personal computers (PCs), tablet PCs, hand-held specialized readers, set-top boxes, electronic cash registers (ECRs), automated teller machines (ATMs), virtual cash registers (VCRs), kiosks, security systems, access systems, and the like. An access device may use any suitable contact or contactless mode of operation to send or receive data from, or associated with, a user mobile device. In some embodiments, where an access device may comprise a POS terminal, any suitable POS terminal may be used and may include a reader, a processor, and a computer-readable medium. A reader may include any suitable contact or contactless mode of operation. For example, exemplary card readers can include radio frequency (RF) antennas, optical scanners, bar code readers, or magnetic stripe readers to interact with a payment device and/or mobile device. In some embodiments, a cellular phone, tablet, or other dedicated wireless device used as a POS terminal may be referred to as a mobile point of sale or an “mPOS” terminal.
[0032] An“application programming interface” (API) refers to a set of routines and protocols defining how software components should communicate and interact. For example, an API for requesting access to resources can define the format and protocol of access request messages, including the required data fields. An API for requesting access to resources can also define what types of authentication information should, or should not, be included in the access request message.
[0033] “Interaction information” or“interaction data” may include data on the type of interaction being conducted by the user of a client device (e.g., tracking the user’s use of the client device). Interaction information can include, for example, the type and amount of resources requested in an access request (specific amount or a range of amounts), the date of the access request, the time of day (specific or within a range) that the access request was made, the geo-location of the client device making the access request, the network address of the client device, a network route or path used to send the access request, and an identifier of the resource management computer that manages the request resource. The interaction information may be used as part of a risk evaluation of the client device, where different interactions may have different levels of risk associated with them. The interaction information may be tracked in a log file that is provided in an access request.
[0034] A“topological graph” may refer to a representation of a graph in a plane of distinct vertices connected by edges. The distinct vertices in a topological graph may be referred to as“nodes.” Each node may represent specific information for an event or may represent specific information for a profile of an entity or object. The nodes may be related to one another by a set of edges, E. An“edge” may be described as an unordered pair composed of two nodes as a subset of the graph G = (V, E), where is G is a graph comprising a set V of vertices (nodes) connected by a set of edges E. An edge may be associated with a numerical value, referred to as a“weight”, that may be assigned to the pairwise connection between the two nodes. The edge weight may be identified as a strength of connectivity between two nodes and/or may be related to a cost or distance, as it often represents a quantity that is required to move from one node to the next.
[0035] A“community” may refer to a group/collection of nodes in a graph that are densely connected within the group. A community may be a subgraph or a portion/derivative thereof and a subgraph may or may not be a community and/or comprise one or more communities. A community may be identified from a graph using a graph learning algorithm, such as a graph learning algorithm for mapping protein complexes. Communities identified using historical data can be used to classify new data for making predictions. For example, identifying communities can be used as part of a machine learning process, in which predictions about information elements can be made based on their relation to one another.
[0036] A“data set” may refer to a collection of related sets of information composed of separate elements that can be manipulated as a unit by a computer. A data set may comprise known data, which may be seen as past data or“historical data.” Data that is yet to be collected, may be referred to as future data or“unknown data.” When future data is received at a later point it time and recorded, it can be referred to as“new known data” or “recently known” data, and can be combined with initial known data to form a larger history. [0037] “Unsupervised learning” may refer to a type of learning algorithm used to classify information in a dataset by labeling inputs and/or groups of inputs. One method of unsupervised learning can be cluster analysis, which can be used to find hidden patterns or grouping in data. The clusters may be modeled using a measure of similarity, which can defined using one or metrics, such as Euclidean distance.
[0038] “Machine learning” may refer to an artificial intelligence process in which software applications may be trained to make accurate predictions through learning. The predictions can be generated by applying input data to a predictive model formed from performing statistical analysis on aggregated data. A clustering algorithm is an example of a machine learning algorithm. A predictive model can be trained using training data, such that the model may be used to make accurate predictions. The prediction can be, for example, a classification of an image (e.g. identifying images of cats on the Internet) or as another example, a recommendation (e.g. a movie that a user may like or a restaurant that a consumer might enjoy). Training data may be collected as existing records. Existing records can be any data from which patterns can be determined from. These patterns may then be applied to new data at a later point in time to make a prediction. Existing records may be, for example, user data collected over a network, such as user browser history or user spending history. Using existing records as training data, training may be performed through a learning module. The learning module may comprise a learning algorithm, which may be used to build a model. The model may be a statistical model, which can be used to predict unknown information from known information. For example, the learning module may be a set of instructions for generating a regression line from training data (supervised learning) or a set of instructions for grouping data into clusters of different classifications of data based on similarity, connectivity, and/or distance between data points (unsupervised learning). The regression line or data clusters can then be used as a model for predicting unknown information from known information. Once the model has been built from the learning module, the model may be used to generate a predicted output from a new request. New request may be a request for a prediction associated with presented data. A new request may comprise the data that a prediction is requested for. For example, new request may comprise pixel data for an image that is to be classified or may comprise user information (e.g. name, location, user history, etc.) that can be used to determine an appropriate recommendation. In order to generate predicted output from the new request, the data included in new request can be compared against model. For example, the position of data received in new request on a graph can be compared against a regression line to predict its next state (i.e. according to a trend). In another example, the position of the data as plotted on a topological graph can be used to determine its classification (e.g. predicting tastes and preferences of a user based on his or her online interactions).
[0039] A“computing device” may be any suitable electronic device that can process and communicate information to other electronic devices. The computing device may include a processor and a computer readable medium coupled to the processor, the computer readable medium comprising code, executable by the processor. The computing device may also include input devices and output devices that are operatively coupled to the processor, as well as an external communication interface for communicating with other computing devices or other entities. For example, the computing device can provide remote communication capabilities to a network. Examples of these remote communication capabilities include using a mobile phone (wireless) network, wireless data network (e.g. 3G, 4G or similar networks), Wi-Fi, Bluetooth, Bluetooth Low Energy (BLE), Wi-Max, or any other communication medium that may provide access to a network such as the Internet or a private network. Some exemplary types of computing device may include a mobile device, a cardholder device, a user device, a consumer device, a server computing device, an appliance, and any computer. Some non-limiting examples of mobile devices include mobile phones (e.g., cellular phones), keychain devices, personal digital assistants (PDAs), pagers, notebooks, laptops, notepads, net books, tablet computers, wearable devices (e.g., smart watches, fitness bands, jewelry, etc.), automobiles or motorcycles with remote
communication capabilities, person music player devices, personal computers, hand-held specialized readers, and the like.
[0040] A“server computer” may include any suitable computer that can provide communications to other computers and receive communications from other computers. A server computer may include a computer or cluster of computers. For instance, a server computer can be a mainframe, a minicomputer cluster, or a group of servers functioning as a unit. In one example, a server computer may be a database server coupled to a Web server. A server computer may be coupled to a database and may include any hardware, software, other logic, or combination of the preceding for servicing the requests from one or more client computers. A server computer may comprise one or more computational apparatuses and may use any of a variety of computing structures, arrangements, and compilations for servicing the requests from one or more client computers. Data transfer and other communications between components such as computers may occur via any suitable wired or wireless network, such as the Internet or private networks.
[0041] Messages communicated between any of the computers, networks, and devices described herein may be transmitted using a secure communications protocols such as, but not limited to, File Transfer Protocol (FTP); HyperText Transfer Protocol (HTTP); Secure Hypertext Transfer Protocol (HTTPS), Secure Socket Layer (SSL), ISO (e.g., ISO 8583) and/or the like.
II. AUTHENTICATION OF RESOURCE ACCESS REQUESTS
[0042] FIG. 1 shows a system diagram of an authentication hub 1 10 in communication with client devices 120, data processing servers 130, and resource managemnet computers 140, in accordance with some embodiments. The client devices 120 can include any device that requests access to a resource being managed by one of the resource management computers 140. For example, a client device could be a point of sale terminal 121 , a personal computer 122, a mobile device 123, a wearable device 124, a smart card 125 (e.g., a biometric card or payment card), or a vehicle 126. Each of the client devices 120 can communicate with the authentication hub over a first network 152. The client devices 120 may communicate with the network 152 using a wired network connection (e.g., ethernet) or a wireless network connection (e.g.. Wi-Fi, cellular, or near field communications).
[0043] The client devices 120 can send access requests that include different types of authentication information and that are formatted differently. To communicate with the variety of different client devices 120 and handle the variety of different access request formats, the authentication hub 1 10 can include an automated client interface automatically adapts the access requests for processing. The client interface can be used for receiving access requests from the client devices 120 and for sending acess responses to the client devices 120 over the first network 152.
[0044] The authentication hub 1 10 can also communicate with a pluality of data processing servers 130. Each of the data processing servers 130 may be capable of processing different types of authentication information. For example, a first data processing server 131 can evaluate one or more hardware identifiers of a client device in order to determine whether a particular client device is a security risk. A second data processing server 132 can determine use the network identifier (e.g., IP Address) of the client device to determine whether a particular client device is a security risk. A third data processing server 133 can analyze biometric data (e.g., a finger print scan or a retina scan) of a user of a client device to determine whether it is associated with a registered user. A fourth data processing server 134 can analyze personal information of the user to determine whether it matches stored account information. The four data processing servers 130 described above are merely examples of the various data processing servers that could be in communication with the authentication hub 1 10. The authentication hub 140 may communicate with other data processing servers to process other types of authenticaiton information.
[0045] To communicate with the variety of different data processing servers 130, the authentication hub 1 10 can include an automated client interface which automatically adapts the access requests for processing. The client interface can be used for receiving access requests from the client devices 120 and for sending acess responses to the client devices 120 over the first network 152.
[0046] The authentication hub 1 10 can provide a data processor interface for
communicating with the data processing servers 130 over a second network 153. The data processor interface can be used for making authentication requests to the data processing servers 130 and receiving authentication responses from the data processing servers 130 over the second network 153.
[0047] The authentication hub 1 10 can also communicate with a pluality of resource management computers 140. Each of the resouce management computers may manage a different type of resource. For instance, a first resource management computer 141 may manage user accounts for a website, a second resource management computer 142 can manage academic resources for a school district, and a third resource management computer 143 can manage payment accounts and provide authorization of payment transactions. The three resource management computers 140 described above are merely examples of the various data processing servers that could be in communication with the authentication hub 1 10. The authentication hub 140 may communicate with other data processing servers to process other types of authentication information.
[0048] The authentication hub 1 10 can provide a resource manager interface for communicating with the resource management computers 140 over a third network 154. The resource management interface can be used for sending access requests to the resource management computers 140 and receiving access responses from the resource
management computers 140 over the third network 154.
[0049] The functionality of the authentication hub is described in further detail below with respect to FIG. 2.
[0050] FIG. 2 shows a functional block diagram of an authentication hub 210, in accordance with some embodiments. As shown in FIG. 2, the authentication hub 210 can perform a variety of functions in order to orchestrate communications between a variety of different client devices, data processing servers, and resource management computers while maintaining the security of authentication information. In some embodiments, these functions may be implemented as hardware components of the authentication hub. In other embodiments, these functions may be implemented as software modules on the
authentication hub (e.g., instructions stored on a non-transitory computer readable medium that can be executed by a processor to perform the functions). In some embodiments, the authentication hub 210 can include a plurality of computers coupled together in a system, where one or more of the computers performs the different functions of the authentication hub 210. As further described below, the authentication hub 210 can perform access request processing 220, authentication request processing 220, and dynamic message routing 240.
A. Access Request Processing
[0051] The authentication hub 210 can perform automated access request processing 220 upon receiving access requests from the variety of different client devices. The
authentication hub’s automated access request processing 220 functionality includes automated client interface adaptation 221 , access request profiling 222, and community prediction 223.
[0052] In certain resource systems, client devices and resource management computers communicate using an application programming interface (API) that may have been specifically designed for that particular resource management computer. However, in the embodiments described herein, the authentication hub 210 can become a centralized point for communications between the client devices and the resource management computers using a variety of different APIs or communication protocols. Accordingly, the authentication hub 210 can be capable of receiving and processing access requests from a variety of different client devices, including access requests in different formats and including different types of authentication information.
[0053] The authentication hub 210 performs automated client interface adaptation 221 in order to handle access requests having new or different formats. That is, instead of being specifically programmed to handle each of the variety of different APIs used by the different resource management computers for communication with the client devices, the
authentication hub 210 can automatically adapt the format of incoming access request messages to be compatible with the APIs used by the authentication hub 210. As such, the authentication hub 210 can receive and process access requests that are different formats, including those that are in formats that have never been received or processed before.
[0054] If the format of an incoming access request from a client device does not match known access request formats, then the authentication hub 221 can perform automated client interface adaptation 221. To perform automated client interface adaptation 221 , the authentication hub 210 can analyze the format of the incoming access request from the client device and then map it to a known API (e.g., a known API used by one of the resource management computers). For instance, the authentication hub 210 can use a linguistic parsing algorithm to build data structures that represent the API of a particular access request message. In some instances, the authentication hub 210 may first perform a lexical analysis of the access request message to create a sequence of symbols or tokens corresponding to the format and information included in that access request message. The symbols/tokens can correspond to the data fields of the access request (e.g., routing information, authentication information, metadata, etc.). Then the authentication hub 210 can perform syntactic analysis on the access request (e.g., on the tokens if lexical analysis was performed) to create a data structure (e.g., a parse tree or a syntax tree) that represents the API of the access request. The authentication hub 210 can then store the API data structure for the client device’s API so that it can be used for later access requests using the same API, such that the API data structure does not have to be rebuilt each time.
[0055] After the authentication hub 210 has parsed the access request message as discussed above, the authentication hub 210 can map the API data structure generated by the linguistic parsing algorithm to a stored API data structure (e.g., an API data structured previously generated based on the access request API of a particular resource management computer). The authentication hub 210 can store a plurality of previously generated API data structures based on the access request messages used by different resource management computers. In some embodiments, the authentication hub 210 can generate the API data structures for the resource management computers (e.g., when the message format used by a particular resource management computer changes). The authentication hub 210 can determine the best API match for a particular access request by comparison to previously used APIs using fuzzy criteria. Using this process, the authentication hub 210 can determine the format of the access request message and identify the types of authentication information included in it, even if the access request is using on a previously unknown API.
[0056] In addition, the authentication hub can recommend a new communication API to the client device. The authentication hub 210 can also add, or remove, data fields or information to the client’s access request so that it is compatible with the API of a particular resource management computer.
[0057] The authentication hub 210 can also perform access request profiling 222 to create profiles for incoming access requests. Access request profiling 222 can also include maintaining a profile for a particular type of interaction data. For example, the authentication hub 210 can create and maintain a profile for a particular IP address or for a particular user account. Access request profiling 222 can speed up the authentication process for a particular access request by reducing the number of potential user accounts that would match the incoming request. For example, if the access request includes biometric data, voice data, or image data as authentication information, but does not include an account number or user identifier, then the authentication information for the entire set of registered user accounts may need to be checked in order to determine if the received authentication information is valid.
[0058] However, using access request profiling 222, the profile for the incoming access request can be applied to a predictive model to identify a subset of user profiles within communities similar to the profile of the incoming access request. For example, the subset of user profiles may have the same IP address as the client device or IP addresses in the same community as the IP address of the client device. The communities may also be based on geo-location, time of date, or a network route or path. Thus, the received authentication information may be validated based on comparisons to authentication information for a small subset of registered user accounts, instead of all of the registered user accounts. Community prediction is described in further detail herein. Request profiling is further described below.
[0059] The authentication hub 210 also performs community prediction 223. The authentication hub 210 can use community prediction 223 to determine a set of predicted communities that the access request is predicted to be within using a predictive model based on interaction data (e.g., an IP address) for a plurality of previous access requests. The predictive model can be based on a topological graph of nodes and edges and a plurality of communities including sets of nodes within the topological graph. The predictive model can be generated using a learning algorithm based on interaction data for a plurality of previous access requests.
[0060] Determining the set of predicted communities can reduce the amount of time and computing resources used in authenticating the access request. For example, the access request may include biometric data (e.g., finger print scan, etc.), voice data (e.g., an audio recording of a phrase or word), image data (e.g., face scan, retina scan, video recording of a person or gesture) or another type of data for authentication that is validated using similarity thresholds. By determining the set of predicted communities, the authentication hub 210 does not need to compare the received authentication information (e.g., biometric data, image data, etc.) to every registered authentication information. Instead, the authentication hub 210 may only compare the received authentication information to registered
authentication information that also falls within a community of the predicted set of communities. Thus, the authentication hub 210 has used community prediction to reduce the amount of potential registered users (e.g., their authentication information) to check from a large number (e.g., all registered users) to a small number (e.g., only the registered users that are in a community similar to the received access request).
[0061] The authentication hub 210 can begin the authentication process by comparing the received authentication information to registered authentication information falling within the most similar predicted communities, further reducing the time and computing resources spent on performing the authentication process. The similarity metrics for a particular community of the topological graph can be determined based on a vector distance between that particular community and the received access request.
[0062] If the received authentication information cannot be validated against the reduced set of authentication information (from the predicted set of communities), then the authentication hub 210 can initiate a registration process to register the user that made the access request. If the user indicates that they are registered, the authentication hub 210 can request additional authentication information or identifiers from the client device or the user for authentication.
[0063] The authentication hub 210 can also use community prediction 223 to determine a risk score for a particular incoming access request. The authentication hub 210 can apply the incoming access request to the predictive model to determine a set of predicted communities for the access request. The authentication hub 210 can also determine an account interaction profile based on an account identifier included in the access request.
The account interaction profile can be based on interaction data for previous access requests that included the account identifier. The authentication hub 210 can determine an expected subset of the plurality of communities by applying the account interaction profile to the predictive model. The expected communities being expected based on the previous access requests made for the particular account. Then, the authentication hub 210 can determine a risk score based on a comparison of the expected communities (based on the account interaction profile) to the predicted communities (based on the incoming access request). The comparison can involve determining a vector distance between an expected community and a predicted community. The greater the vector distance between the expected community and the predicted community the higher the risk that the access request is fraudulent (e.g., the risk score can be higher). Community prediction is further described below.
B. Authentication Request Processing
[0064] After the receiving and processing an access request, the authentication hub 210 can perform automated authentication request processing 230. The authentication hub’s automated authentication request processing 230 functionality includes authentication information validation 233, automated privacy control 234, and automated request modification 235.
[0065] The authentication hub 210 can perform authentication information validation 233 as part of an authentication process. The authentication hub 210 can store registered authentication information for a plurality of registered users/accounts and can compare authentication information received in an incoming access request to registered authentication information of the registered users/accounts. In some embodiments, then authentication hub 210 can perform authentication information validation on a subset of the authentication information received in the access request while one or more data processing servers are used to validate other authentication information received in the access request.
[0066] The authentication hub 210 can perform automated privacy control 234 to prevent excessive amounts of sensitive authentication information from being distributed to data processing servers or other third parties. By restricting the type and amount of sensitive information used for authentication, the authentication hub 210 can reduce the risk of such information being intercepted or leaked (e.g., due to a security breach at one of the data processing servers).
[0067] As part of automated privacy control 234, the authentication hub 210 can determine that more, or less, authentication information is required to authenticate a client device depending on various factors. For example, the authentication hub 210 can determine that less authentication information is required in order to authenticate a client device having a higher trust level compared to a client device having a lower trust level. In addition, the authentication hub 210 can determine that more authentication information is required to authenticate a client device that is requesting resources having a higher resource security level (e.g., a greater amount of resources or a more sensitive type of resource) compared to one requesting resources having a lower security level (e.g., fewer resources or a less sensitive type of resource). In another example, the authentication hub 210 can require more authentication information when a determined threat level within the network is higher, compared to when the determined threat level within the network is lower.
[0068] The authentication hub 210 can also assign weights to different types of authentication information such that it has more or less authentication information is needed to validate the client device depending on what type of authentication information is available. In one example, the authentication hub 210 may determine that a certain authentication level is sufficient to authenticate the client device for a particular access request. In this example, the authentication hub 210 may determine that validating biometric information of the user of the client device would meet or exceed the authentication level.
The authentication hub 210 may also determine that validating both a network address of the client device and a hardware identifier of the client device would meet or exceed the authentication level. Thus, even if the client device does not provide biometric information, the authentication hub 210 can authenticate the client device as long as its network address and hardware identifier are provided.
[0069] Furthermore, even if the client device did provide biometric information, the authentication hub 210 can determine whether the client device should be authenticated using the biometric information, or using a network address and hardware identifier instead, based on the sensitivity levels of the different sets of authentication information. For example, the authentication hub 210 can determine that the client device should be authenticated using the network address and hardware identifier, instead of using the biometric information, based on the biometric information having a higher sensitivity level than the sensitivity level of the network address and the hardware identifier. As such, less sensitive information can be used for authentication if it is available and would meet the authentication level determined by the authentication hub.
[0070] The authentication hub 210 also performs automated request modification 235.
For example, the authentication hub 210 can append additional information, stored at the authentication hub 210, to the authentication request. The additional information may enable a particular data processing server to be capable of handling the authentication request. For example, if the authentication hub 210 has stored a hardware identifier for a particular client device from past access requests, and the data processing server would use the hardware identifier for authentication, then the authentication hub 210 can add the hardware identifier to the authentication request sent to the data processing server, even if the client device did not include the hardware identifier in the access request that is currently being processed.
C. Dynamic Message Routing
[0071] After determining the set of authentication information to be included in the authentication request to a data processing server, the authentication hub 210 can generate an authentication request message including that set of authentication information. In some embodiments, the authentication hub 210 may not rely on data processing servers for performing authentication for a particular access request. In some embodiments, the authentication hub 210 may perform authentication of the access request and may not send an authentication request to a data processing server. If an authentication request message for a data processing server is generated, the authentication hub 210 can perform dynamic message routing 240. The dynamic message routing 240 process can include automated data process interface adaptation 246 and data processor evaluation 247.
[0072] The authentication hub 210 can perform automated data processor interface adaptation 246 for communications for data processing server using processes similar to those used in the automated client interface adaptation 221 process discussed above with respect to communications from client devices. That is, the authentication hub 210 can generate API structures (e.g., using a linguistic parser) for each of the data processing servers. Then, the authentication hub 210 can modify an authentication request to match the API for a given data processing server using its API structure (e.g., the API structure determined by the authentication hub 210 for that particular data processing server). Thus, the authentication hub 210 can adapt the authentication requests to match the API protocol of the data processing server that they are being sent to.
[0073] The authentication hub 210 can also perform data processor evaluation 248. For example, the authentication hub 210 can evaluate the capabilities, authentication information requirements, exposure level, network condition (e.g., the response time between sending an authentication request by the authentication hub and the receiving of an authentication response from the data processing server), stability, accuracy, of each data processing server. The authentication hub 210 may evaluate the data processing servers prior to receiving an access request such that the evaluation does not slow down the processing of the access request.
[0074] The authentication hub 210 can use information from this evaluation in determining which data processing server to route an authentication request message to. For example, several data processing servers may be capable of validating a particular type of authentication information but each of the data processing servers may have different evaluated response times, stability levels, etc. The authentication hub 210 can use an Al to select a particular data processing server, based on the evaluated criteria, to send the authentication request to.
[0075] In addition, the authentication hub 210 can also evaluate the authentication responses received from the data processing servers. If the message received from the data processing server is suspicious (e.g., the formatting has changed compared to previously received responses from that same data processing server), then the
authentication hub can determine to re-route authentication requests originally destined for that particular data processing server to different data processing servers that are capable of handling those authentication requests.
III. AUTHENTICATION PROCESS USING COMMUNITY PREDICTION
[0076] FIG. 3 shows an information flow diagram of an authentication process using community prediction, in accordance with some embodiments. The authentication process can be performed by the authentication hub or the data security hub described herein.
[0077] The authentication hub can store interaction data 302 for a plurality of previous access requests. For each of the previous access requests, the interaction data 302 can include a geolocation of the client device, a time of day of the access request (e.g., a specific time or a time range), a date of the access request, an identifier of a resource, a resource amount, an identifier of the resource manager of the resource, an identifier of a resource management computer of the resource manager, an IP address of the client device, and a network route or path used for sending the access request to the data security hub.
[0078] The authentication hub can use a learning algorithm 304 (e.g., a clustering algorithm) to create a topological graph 306 based on the interaction data 302. The topological graph 306 includes nodes and edges connecting the nodes. Each node in the topological graph 306 can correspond to a certain type of data within the interaction data 302. To create the topological graph 306, the learning algorithm 304 may determine an edge-weight for each of the edges in the topological graph 304 and a node-weight of each of the nodes in the topological graph 304. The node-weight for a particular node can be normalized based on the maximum node-weight corresponding to its particular node type.
[0079] The learning algorithm 304 can then determine a plurality of communities 308 from the topological graph 308 to form a predictive model. Each community of the plurality of communities 308 can include a subset of the nodes. The nodes that are included in a community by the learning algorithm 304 can be based on a vector distance between the nodes. Each of the communities 308 can be associated with a vector for that community based on the vectors of the nodes within that community. The creation of the topological graph and the communities to form the predictive model is further described below.
[0080] The topological graph 306, the plurality of communities 308 and the corresponding predictive model can be created by the authentication hub at routine time intervals (e.g., every 6 months) based on a new set of interaction data 302. One the predictive model is created, it can be used processing access requests received in real-time.
[0081] For example, the authentication hub can receive an incoming access request 312. The authentication hub can determine a set of predicted communities 314 for the incoming access request 312 based on a vector distance between a vector based on the request interaction data and the vectors associated with the plurality of communities 308.
Communities 308 having shorter vector distances (e.g., below a similarity threshold) may be selected to be included in the predicted subset of the plurality of communities 314.
[0082] After determining the predicted subset of the plurality of communities 314, the authentication hub an initiate an authentication process for the access request 312 using the predicted subset of the plurality of communities.
[0083] For example, the authentication hub can obtain client authentication information 322 from the access request 312. The authentication hub can also determine a subset of registered authentication information corresponding to account identifiers included in the predicted subset of the plurality of communities. As described herein, the authentication process can be sped up by only comparing the client authentication information 322 to a subset of the registered authentication information 324 (instead of the entire set of registered authentication information). This process is useful when the access request includes biometric data, audio data, or image data as authentication information but does not include an identifier of a particular account or user. The authentication hub can determine an authentication outcome 326 (e.g., whether the access request has been authenticated) based on whether the client authentication information 322 matches registered
authentication information included in the subset 324. The determination of the
authentication outcome 326 using the predicted set of communities 314 is further described below.
[0084] In another example, the authentication hub can obtain an account identifier from the access request 312. Then, the authentication hub can determine an account interaction profile based on previous access requests associated the account identifier. The previous access requests for that account can be obtain from historical account information 332, which may be stored at the authentication hub. The data security hub can then determine an expected subset of the plurality of communities based on the account interaction profile 334. The authentication hub can then determine a risk score 3336 for the access request based on a comparison of the expected subset of the plurality of communities to the predicted subset of the plurality of communities. The comparison of the expected subset to the predicted subset may be based on a vector distance between the two. The
determination of the risk score 336 is further described below.
IV. COMMUNITY PREDICTION MODEL USING A TOPOLOGICAL GRAPH
[0085] FIG. 4 shows an exemplary topological graph 400 and communities, in accordance with some embodiments. As discussed herein, the authentication hub can store interaction data for a plurality of previous access requests. The interaction data can include information about the geolocation, date, and time of day that the previous access requests were made as well as the network address and network route used to send the access request. The interaction data can also include an account identifier, an identifier the resource manager that the request is for, an identifier of a type of resource manager (“R.M. type”), an identifier of the resource, an amount of the resource (e.g., specific amount of a range of amounts). The interaction data may be determined from the plurality of previous access request messages.
[0086] The authentication hub may create a topological graph, such as the topological graph 400 of FIG. 4, based on the interaction data. In embodiments, the graph may be a topological graph comprising nodes (represented as circles in FIG. 4) and edges
(represented as lines connecting the nodes in FIG. 4). A node for each distinct information element amongst the plurality of received interaction data may be generated and plotted on the topological graph. For example, a node for each account identifier, resource manager identifier, resource manager type (“R.M. type” in FIG. 4), IP Address, resource amount, etc. may be generated and plotted. The different types may be predetermined based on the different types of interaction data stored.
[0087] Then, the plotted nodes may be connected to one another via edges that represent the relationship/linkage between nodes. Each edge may be associated with a weight quantifying the interaction between the two nodes of the edge. For example, a node for an account identifier of a specific user may be connected to a node for a resource manager identifier of a specific resource manager. The weight of the connecting edge between the two nodes may reflect a quantity of interactions (access requests) between the specific resource manager and user. For example, the user may have conducted 5 access requests at the resource manager (based on the stored interaction data), which may result in the edge between the nodes having a weight of 5. Furthermore, the account identifier node and resource manager identifier node may be related to other distinct information elements included in the access request. For example, the node for the resource manager identifier may be connected to a node for a resource manager type relating to the resource manager (e.g. 5651 - Family Clothing Stores). In another example, the node for the account identifier may be connected to a node for a geolocation (e.g., zip code) at which the access request may have been initiated (e.g. 941 10 - San Francisco). Thus, the nodes may represent data that are distinct from each other, and/or the nodes may characterize underlying data at specific levels of generality (e.g., a resource manager ID may represent a specific resource manager, but an resource manager type may represent a category that the specific resource manager is part of).
[0088] The authentication hub may also determine and assign edge-weights to each edge may of the topological graph based on the interaction data. The edge-weights may be related to vector distances between nodes, as the position of two nodes relative to one another can be expressed as vector in which edges between nodes have a specific length quantifying their relationship. For example, the relationship between two nodes can either be measured as a weight in which higher correlations are given by higher weights, or, the relationship can be measured as a distance, in which higher correlations are given by shorter distances. In the latter case, highly connected nodes that interact frequently with each other may be densely populated in the graph (i.e. close to one another within a distinct region of the graph). Thus, the length of an edge can be inversely proportional to its edge- weight.
[0089] A path in the topological graph 400 may be defined as one or more edges that can be traversed to move from a first node to a second node. In one embodiment, the length of a path may be determined to be the total length of the edges included in the path. For example, a path may comprise edges [a,b], [b,c], and [c,d], which may each comprise edge lengths of 2, 3, and 4, respectively. The length of the path may then be found by summing the edges along the path and may be equal to,‘2+3+4 = 9.’ In another embodiment, the length of a path may be determined to be the number of edges along the path. For example, a path from node a to d may comprise three edges: [a,b], [b,c], and [c,d], and the length of the path may then be equal to‘3.’ [0090] The authentication hub may then determine a plurality of communities from the created topological graph. The communities may be groups of nodes that are highly connected (as given by greater weights and shorter distances), indicating that they have a high probability of interacting with one another based on the interaction data. These communities can then be used to classify incoming data (e.g., an incoming access request) and generate a predicted outcome (e.g. predicted interaction) for the incoming data. In an embodiment, the communities can be generated using an unsupervised learning algorithm that clusters nodes into distinct and densely populated groups (e.g., the IPCA clustering algorithm). Furthermore, the learning algorithm can be an algorithm that generates communities that overlap (i.e. where nodes can belong to more than one community). As shown in FIG. 4, Community A includes nodes that are also included in Community B (e.g., the node for“IP Address: 23.0.0.108” and the node for“Resource Provider ID: 5”). The graph learning algorithm for mapping groups of nodes in an interaction network can be used, as described in further detail below.
[0091] Once the authentication hub has determined the communities (e.g., Community A, Community B, and Community C in FIG. 4), they may be accumulated and recorded in a graph database to form a predictive model. For example, each community can be identified by a unique community ID, which may further be associated with unique identifiers for each node included in the community. The community information may be stored in a graph database, and at a later point in time, data received in a request for a prediction can then be compared to the determined communities to classify the data and generate the requested prediction.
[0092] After generating the predictive model, the authentication hub can use the predictive model to determine predicted communities for incoming access requests received in realtime. The authentication hub can determine interaction data for the incoming access request based on the data included in the access request and information gathered by the authentication hub at the time the access request is received (e.g., date, time of day, network address, network route, etc.). The interaction data for the access request can then be compared to the predictive model to determine a set of predicted communities. The set of predicted communities being a subset of the plurality of communities in the topological graph. [0093] In one example, the interaction data for a particular incoming access request may comprise a location and a resource manager identifier. The authentication hub can apply the interaction data for the access request to the predictive model to identify a set of predicted communities that were formed based on interaction data similar to the interaction data in the incoming access request. For example, an incoming access request having interaction data indicating a location of“941 10,” and a resource manager identifier of“3” may be determined to fall within“Community A” of FIG. 4 since“Community A” includes nodes for the location of“94110,” and the resource manager identifier of“3.” Thus, the authentication hub can determine that Community A is a predicted community for the incoming access request.
[0094] A plurality of predicted communities can be also be determined based on a vector distance between the access request and the communities of the predictive model. The authentication hub can vectorize the interaction data for access request and the community structure (e.g., the nodes and edges) and determine a vector distance between interaction data for access request and the community structure. A particular community may be determined to be a predicted community based on the vector distance being below a similarity threshold, where a lower similarity threshold would result in fewer predicted communities and a higher similarity threshold would result in more predicted communities.
[0095] As discussed above, the“distance” between two nodes can be related to an edge weight of the edge connecting the two nodes. Nodes that frequently interact and/or have a strong level of correlation to one another may be connected by highly weighted/strong edges. For example, a node for a resource manager that is busiest during morning hours may be a connected to a node for time“09:00-1 1 :00” by a strong edge of weight 20, but may be connected to a node for time“18:00-22:00” by a weak edge of weight 1. As another example, a node for a resource manager that managers expensive resources may be connected to a node for a resource amount of“100-500” by a strong edge with high weight, and may be connected to a node for a resource amount of“30-50” by a weak edge of low weight.
[0096] As discussed above, the topological graph can includes communities of nodes that are closely connected (e.g., the distance between the nodes is smaller compared to nodes in others communities). In one example, nodes of the graph, G, can be grouped into communities, K. Each distinct community, K, may comprise densely populated nodes that interact more frequently with one another than with nodes of a different community.
Furthermore, a community may have a diameter, which may describe the scope of the community. In one embodiment, the diameter of a community may be denoted as, SP(K), and may be defined to be the largest length of any shortest path between any two nodes in K. For example, a community, K, may comprise nodes A, B, C, and D, and the shortest possible path between each pair of nodes may be‘AB: 1’,‘AC: 3’,‘AD: 4’,‘BC: 2’,‘BD: 3,’ and‘CD: 1.’ In this example, the diameter, SP(K) would be equal to 4, as the longest shortest path is‘AD:4.’ In another embodiment, the diameter of a community, K, can be denoted by ASP(K), and may defined to be the average length of all the shortest paths between each pair of nodes in K. For example, a community comprising nodes‘A,’‘B,’‘C’, and‘D’ may have shortest paths‘AB: T,‘AC: 3’,‘AD: 4’,‘BC: 2’,‘BD: 3,’ and‘CD: 1.’ Then ASP(K) may be calculated as‘(1 +3+4+2+3+1)/ 6 = 2.333’. Methods for determining shortest paths in a graph may be found in U.S. Patent Application No. 15/590,988, filed on May 9, 2017, which is herein incorporated by reference in its entirety for all purposes.
[0097] Communities with the same diameter can have very different topologies. To distinguish different topologies of communities with the same diameter, another control parameter can be defined. For a more dense community, a node may be connected to most of the other nodes in the community. On the other hand, for a more sparse community, a node may be connected to only a few nodes within the community. To measure how strongly a node, v is connected in a community, K, the interaction probability INVK of a node, v, to a community, K, where v g K, may defined by the equation:
Figure imgf000029_0001
where mv K is the number of edges shared between the node v and the nodes included in K, and where nK is the number of nodes included in K.
A. Weighting Nodes
[0098] To determine a plurality of communities from a graph, for each community, each edge in the graph may be assigned a weight. A methodology similar to the IPCA clustering algorithm (identifying protein complexes algorithm) may be used to form communities. In one embodiment, for an input graph G = (V, E), the weight assigned to an edge between nodes u and v, [u, v], may be defined as the number of neighbors (adjacent nodes) shared by the nodes u and v. For example, node u may be connected to nodes a, b, c, x, y, and z. Meanwhile, node v may be connected to nodes x, y, and z. Therefore, the weight assigned to the edge [u,v] may be 3, as nodes u and v share three neighbors (nodes x, y, and z). In another embodiment, the weight of each edge may be computed based on a quantity of interactions comprising the two nodes connected by the edge. For example, the weight of an edge between nodes for an IP address and a resource manager may be 5, which may represent 5 access requests made by a client device using the IP address at the resource manager. As another example, the weight of an edge between a node for a resource manager and a node for resource manager type, which may be included in every access request conducted at the resource manager, may have a weight of 100. Meanwhile an edge connecting the resource manager node to a node for an hour of operation at which 10% of the resource manager’s access requests occur may have a weight of 10.
[0099] In embodiments, a node-weight for each node in the topological graph may also be computed. In one embodiment, the node-weight of each node may be computed as the sum of the edge-weights of its incident edges. For example, a node may be connected to 5 adjacent nodes and the edge-weights of the incident edges connecting the node to the 5 adjacent nodes may be‘3, 3, 4, 6, 9.’ The node-weight of the node may be then be computed as,‘3+3+4+6+9 = 25.’ In some embodiments, after all nodes have been assigned nodeweights, the nodes may be sorted in decreasing order by weight, and stored in a queue, Sq.
[0100] In some embodiments, the authentication hub can normalize the node-weight for each node in the topological graph based on a maximum node-weight for its particular type of node. For example, if the topological graph includes 4 nodes having a node type of“IP address” and these 4 nodes have respective node-weights of 25, 21 , 16, and 22, then the authentication hub can determine that the maximum node weight for the nodes of type“IP address” is“25.” The authentication hub can then normalize the node-weights for each node of type“IP address” using the maximum node-weight of“25.” By normalizing the nodeweights for each node type, the topological graph can include a plurality of different node types while maintaining consistency of vector distance between the nodes.
B. Selecting Seed
[0101] Each community that is to be created may originate from a seed node. The seed node may serve as a first node in a community that is being generated, and the community may be further built by extending the community from the first node based on whether or not nearby nodes meet predefined criteria. The predefined criteria for adding nearby nodes is further described below. In one embodiment, the highest weighted nodes in the queue Sq may be selected as the seed nodes of each community. In an embodiment, to begin the process of determining communities from the graph, the first node (i.e. highest weight node) in the queue Sq may be selected a seed node to grow a new community.
C. Extending Community
[0102] According to embodiments, a new community may be built from a seed node by extending the community K to include nearby nodes (neighbors) that are connected to one or more nodes included in the community. In one embodiment, the new community K may be extended by adding nodes recursively from its neighbors according to priority. In one embodiment, the priority of a neighbor v of K may be determined by the value INVK, the interaction probability between v and the nodes of the new community K. In an
embodiments, the node with the highest interaction probability against K may be selected as the neighboring node with the highest priority.
[0103] In embodiments, whether a high priority neighboring node v is added to the new community is determined by an Extend-judgment test that tests if v is a (K, Tin, d)-vertex.
The predefined criteria for a (K, Tin, d)-vertex evaluated in the Extend-judgement test is described below. In an embodiment, a candidate node v may be added to the new community if the candidate node v is a (K, Tin, d)-vertex. Once the new node v is added to the community, the community may be updated, i.e., the neighbors of the new community may be re-constructed from the graph, G, and the priorities of the neighbors of the new community may be re-calculated.
D. Extend-judgment
[0104] In an embodiment, whether or not a candidate node (neighboring node) v is added to a community K may be determined by two conditions. First, interaction probability INVK, of the candidate node against the community may be calculated In an embodiment, the candidate node will not be added to the community if the value INVK is less than a predetermined threshold, Tin. In one embodiment, the predetermined threshold, Tin, may be a predetermined number between 0 and 1. In embodiments, the predetermined threshold, Tin, may be chosen to control the number of nodes included in each community as well as the total number of communities generated. For example, a greater Tin value may result in a greater number of communities as well as fewer nodes in each community. Meanwhile a lower T in value may result in fewer communities, with each community comprising a greater number of nodes. This may further affect the outputted predictions of the model, as a model with more communities may have greater resolution and may result in more precise predictions (e.g. fewer false positives); however, a model comprising communities that include a large number of nodes may be capable of predicting interactions that would have otherwise been missed had the communities been any smaller (e.g. interactions with lower probability that can nevertheless occur). Accordingly, a Tin value may be selected based on the balance between these outcomes, and may be adjusted for desired results.
[0105] According to embodiments, if the candidate node v passes this first test, the diameter of the extended community K + v, may be calculated. As described above, the diameter of a community can be calculated as the largest length (i.e. maximum possible length) of any shortest path between any two nodes in the community, SP(K), or can be calculated as the average length of all shortest possible paths between each pair of nodes in the community, ASP(K). In an embodiment, the diameter of the graph K + v may be calculated and compared to a parameter d, which may be a pre-established boundary for communities that are being built. If the computed value of the diameter of K + v is bounded by d, then the vertex v may be added to the community (i.e. K = K + v). The parameter d may be set based on the nature of the interaction data that is being used. For example, it may be determined that for an interaction network of users and resource managers, 95% of interactions occur between users and resource managers that are only 5 or fewer connections away from each other, and the parameter d may be set as‘5. If the node v fails to meet either of the predefined criteria, then the next highest priority neighbor of the community is tested, and so on. Once all remaining neighbors of the community fail to meet the predefined criteria, then the community cannot be further extended, and the nodes of community K may be completely determined. Then, the nodes included in community K, as completely built, may be removed the queue Sq before selecting the next node in the queue.
[0106] Once the building of a community has been completed, all of the nodes included in the community may be removed from the queue Sq, and the first node (highest weighted node) remaining in the queue Sq may be selected as the seed for the next community, which may then be extended according to the above process. The selecting seed, extending community, and extend-judgement processes may be repeated until the queue Sq has been completely emptied.
[0107] Accordingly, this approach may generate overlapping communities, as the nodes of the generated communities are only removed from the queue Sq, but not from the original graph G from which candidate nodes are selected from during the extending community process. Furthermore, the process may guarantee that no two generated communities would be same, as the seed node for a new community may be selected such that the seed node does not belong to any of the previously constructed communities. The technical advantages of the above mentioned features include the expression of multiple traits of any given node when making a prediction. This allows for more accurate predictions that can be tailored to specific locations, time of days, etc., and thus can account for a large range of qualities of any given node. For example, prior methods for predicting interactions could only classify nodes into a single community, whereas the currently presented method accounts for nodes belonging to multiple communities. This may be beneficial, for example, when predicting interactions between users and resource managers that belong to more than one community, and whose interactions vary as conditions change. Furthermore, the method allows for the mapping of interactions at multiple levels. This is of particular use for predicting interactions between users and resource managers, as correlations between users and resource managers are expressed in each community (i.e. by account number and name) as well as non-intuitive correlations between concepts relating to the users and resource managers, such as location and MCC code. Even further, correlations between concepts relating to interactions themselves may be expressed, such as the time and nature of the interaction that occurred (e.g. as expressed by resource amount and by the means by which an access request was made).
V. EXEMPLARY METHODS FOR PROCESSING ACCESS REQUESTS
[0108] FIG. 5 shows a flow chart of an exemplary method for processing access request messages through a data security hub, in accordance with some embodiments. The data security hub or authentication hub described above can perform the method for processing access request messages
[0109] At step 501 of the method, the data security hub can create a topological graph based on the interaction data for a plurality of previous access requests. The interaction data can be stored at the data security hub. The topological graph includes nodes and edges connecting the nodes. Each node in the topological graph can correspond to a certain type of interaction data. For example, the types of interaction data can include a geolocation of the client device, a time of day of the access request (e.g., a specific time or a time range), a date of the access request, an identifier of a resource, a resource amount, an identifier of the resource manager of the resource, an identifier of a resource management computer of the resource manager, an IP address of the client device, and a network route or path used for sending the access request to the data security hub.
[0110] In creating the topological graph, the data security hub may also determine an edge-weight for each of the edges in the topological graph. The data security hub may also determine a node-weight of each of the nodes in the topological graph based on the edge- weights. The data security hub may also determine a maximum node-weight corresponding to each of a plurality of node types based on the node-weight of each of the nodes. Each of the nodes may be associated with a particular node type. The node type can correspond to the type of interaction data that the node represents in the topological graph. The data security hub may also normalizing the node-weight of each of the nodes in the topological graph based on the maximum node-weight corresponding to its particular node type. In some embodiments, when the data security hub determines the plurality of communities from the topological graph to form the predictive model, the determination is based on the normalized node-weights.
[0111] At step 502 of the method, the data security hub can determine a plurality of communities from the topological graph to form a predictive model. The plurality of communities can be determined using a clustering algorithm as discussed above. Each community of the plurality of communities can include a subset of the nodes. The nodes included in a community can be based on a distance between the nodes.
[0112] In determining the plurality of communities from the topological graph, the data security hub generate a queue including the nodes in decreasing order by node-weight. The data security hub may also select a first seed node from the queue, where the first seed node has the highest node-weight of the nodes in the queue. The data security hub may also create a first community that includes the first seed node. The data security hub may also calculate an interaction probability for each of a plurality of candidate nodes that are not included in the first community. The interaction probability can be the probability of a candidate node interacting with a community node that is included in the first community.
The data security hub may also determine a highest priority candidate node based on the interaction probability for each of the plurality of candidate nodes that are not included in the first community. The data security hub may also determine whether the highest priority candidate node meets predefined criteria. The data security hub may also add the highest priority candidate node to the first community based on the determination of whether the highest priority candidate node meets the predefined criteria. The data security hub may also determine a next highest priority candidate node based on the interaction probability for each of the plurality of candidate nodes not included in the first community. The data security hub may also determine that the next highest priority candidate node does not meet the predefined criteria. The data security hub may then output the first community. The data security hub may assign a unique community-identifier to the first community comprising and node-identifiers for the set of nodes that are included in the first community. Then, the data security hub may remove, from the queue, the set of nodes included in the first community. After creating the first community, the data security hub can create additional communities using a similar process. For example, the data security hub can select a second seed node from the queue, the queue not including the set of nodes included in the first community, and then create a second community including the second seed node. The plurality of communities of the topological graph include the first community and the second community.
[0113] At step 503 of the method, the data security hub can receive, from a client device, an access request. The access request can include authentication information and other data.
[0114] At step 504 of the method, the data security hub can determine request interaction data for the access request. For example, the request interaction data can include a geolocation of the client device, a time of day of the access request (e.g., a specific time or a time range), a date of the access request, an identifier of a resource, a resource amount, an identifier of the resource manager of the resource, an identifier of a resource management computer of the resource manager, an IP address of the client device, and a network route or path used for sending the access request to the data security hub. The data security hub may gather a portion of the request interaction data. For example, the data security hub may perform a trace route procedure to determine the network route used to send the access request. [0115] At step 505 of the method, the data security hub can determine a predicted subset of the plurality of communities by applying the request interaction data to the predictive model. Applying the request interaction data to the predictive model can include creating a topological graph structure based on the request interaction data, vectorizing the request interaction data, vectorizing the plurality of communities in the topological graph, and determining a vector distance between a request vector and a vector for each community of the plurality of communities. Communities having the shortest vector distance (e.g., below a similarity threshold) may be selected to be included in the predicted subset of the plurality of communities. In some embodiments, the data security hub can determine a network address of the client device and the predictive model can be based on the network addresses used for sending the plurality of previous access requests.
[0116] At step 506 of the method, the data security hub can initiate an authentication process for the access request using the predicted subset of the plurality of communities.
[0117] In some authentication processes, the data security hub can obtain client authentication information from the access request. The data security hub may also determine a subset of registered authentication information corresponding to account identifiers included in the predicted subset of the plurality of communities. In some embodiments, the client authentication information includes biometric data of a user of the client device and the subset of registered authentication information includes biometric data of registered users. The authentication process can further include comparing the client authentication information to the subset of registered authentication information. The data security hub may then determine an authentication outcome based on the comparing of the client authentication information to the subset of registered authentication information. An authentication response sent to the client device can be based on the authentication outcome. In some embodiments, the data security hub can determine that the client authentication information is not registered based on the comparison of the client authentication information to the subset of registered authentication information. Then the data security hub can initiate a registration process based on the determination that the client authentication information is not registered.
[0118] In some authentication processes, the data security hub may obtain an account identifier from the access request. Then, the data security hub can determine an account interaction profile based on previous access requests associated the account identifier. The data security hub can determine an expected subset of the plurality of communities based on the account interaction profile. The data security hub can then determine a risk score for the access request based on the comparison of the expected subset of the plurality of communities to the predicted subset of the plurality of communities. An authentication response sent to the client device can be based on the risk score. The data security hub may also determine a vector for each of the plurality of communities based on nodes and edges of the topological graph corresponding to each respective community. The data security hub may also determine an expected vector based on the account interaction profile. The data security hub may then determine a vector distance between the expected vector and the vector representing each of the plurality of communities. The determination of the expected subset of the plurality of communities by the data security hub can be based on the vector distances between the expected vector and the vector representing each of the plurality of communities.
[0119] In some embodiments, the data security hub can restricting the authentication information of the access request based on the risk score to obtain a restricted set of authentication information. The data security hub may also send the restricted set of authentication information to a data processing server. In response, the data processing server may receive a second authentication response from the data processing server. The authentication response provided by the data security hub can be based on the second authentication response from the data processing server.
[0120] At step 507 of the method, the data security hub can provide, to the client device, an authentication response based on the authentication process for the access request. The authentication response can indicate whether the access request was authentication or not.
VI. EXEMPLARY COMPUTER SYSTEM
[0121] The various participants and elements described herein may operate one or more computer apparatuses to facilitate the functions described herein. Any of the elements in the above-described figures, including any servers or databases, may use any suitable number of subsystems to facilitate the functions described herein.
[0122] Such subsystems or components are interconnected via a system bus.
Subsystems may include a printer, keyboard, fixed disk (or other memory comprising computer readable media), monitor, which is coupled to display adapter, and others. Peripherals and input/output (I/O) devices, which couple to an I/O controller (which can be a processor or other suitable controller), can be connected to the computer system by any number of means known in the art, such as a serial port. For example, a serial port or an external interface can be used to connect the computer apparatus to a wide area network such as the Internet, a mouse input device, or a scanner. The interconnection via the system bus allows the central processor to communicate with each subsystem and to control the execution of instructions from system memory or the fixed disk, as well as the exchange of information between subsystems. The system memory and/or the fixed disk may embody a computer readable medium.
[0123] As described, the embodiments may involve implementing one or more functions, processes, operations or method steps. In some embodiments, the functions, processes, operations or method steps may be implemented as a result of the execution of a set of instructions or software code by a suitably-programmed computing device, microprocessor, data processor, or the like. The set of instructions or software code may be stored in a memory or other form of data storage element which is accessed by the computing device, microprocessor, etc. In other embodiments, the functions, processes, operations or method steps may be implemented by firmware or a dedicated processor, integrated circuit, etc.
[0124] It should be understood that any of the embodiments can be implemented in the form of control logic using hardware (e.g. an application specific integrated circuit or field programmable gate array) and/or using computer software with a generally programmable processor in a modular or integrated manner. As used herein, a processor includes a singlecore processor, multi-core processor on a same integrated chip, or multiple processing units on a single circuit board or networked. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will know and appreciate other ways and/or methods to implement embodiments of the present invention using hardware and a combination of hardware and software.
[0125] Any of the software components or functions described in this application may be implemented as software code to be executed by a processor using any suitable computer language such as, for example, Java, C, C++, C#, Objective-C, Swift, or scripting language such as Perl or Python using, for example, conventional or object-oriented techniques. The software code may be stored as a series of instructions or commands on a computer readable medium for storage and/or transmission. A suitable non-transitory computer readable medium can include random access memory (RAM), a read only memory (ROM), a magnetic medium such as a hard-drive or a floppy disk, or an optical medium such as a compact disk (CD) or DVD (digital versatile disk), flash memory, and the like. The computer readable medium may be any combination of such storage or transmission devices.
[0126] Storage media and computer-readable media for containing code, or portions of code, can include any appropriate media known or used in the art, including storage media and communication media, such as but not limited to volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage and/or transmission of information such as computer-readable instructions, data structures, program modules, or other data, including RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, data signals, data transmissions, or any other medium which can be used to store or transmit the desired information and which can be accessed by the computer. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will appreciate other ways and/or methods to implement the various embodiments.
[0127] Such programs may also be encoded and transmitted using carrier signals adapted for transmission via wired, optical, and/or wireless networks conforming to a variety of protocols, including the Internet. As such, a computer readable medium according to an embodiment of the present invention may be created using a data signal encoded with such programs. Computer readable media encoded with the program code may be packaged with a compatible device or provided separately from other devices (e.g., via Internet download). Any such computer readable medium may reside on or within a single computer product (e.g. a hard drive, a CD, or an entire computer system), and may be present on or within different computer products within a system or network. A computer system may include a monitor, printer, or other suitable display for providing any of the results mentioned herein to a user.
[0128] Any of the methods described herein may be totally or partially performed with a computer system including one or more processors, which can be configured to perform the steps. Thus, embodiments can be directed to computer systems configured to perform the steps of any of the methods described herein, potentially with different components performing a respective steps or a respective group of steps. Although presented as numbered steps, steps of methods herein can be performed at a same time or in a different order. Additionally, portions of these steps may be used with portions of other steps from other methods. Also, all or portions of a step may be optional. Additionally, any of the steps of any of the methods can be performed with modules, units, circuits, or other means for performing these steps.
[0129] The specific details of particular embodiments may be combined in any suitable manner without departing from the spirit and scope of embodiments of the invention.
However, other embodiments of the invention may be directed to specific embodiments relating to each individual aspect, or specific combinations of these individual aspects.
[0130] The above description of example embodiments has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure to the precise form described, and many modifications and variations are possible in light of the teaching above.
[0131] A recitation of“a,”“an” or“the” is intended to mean“one or more” unless specifically indicated to the contrary. The use of“or” is intended to mean an“inclusive or,” and not an“exclusive or” unless specifically indicated to the contrary. The use of the terms “first,”“second,”“third,”“fourth,”“fifth,”“sixth,”“seventh,”“eighth,”“ninth,”“tenth,” and so forth, does not necessary indicate an ordering or a numbering of different elements and may simply be used for naming purposes to clarify distinct elements. The use of“client” computer and“server” computer does not necessary indicate the intended use of the computers, but may simply be used for naming purposes.
[0132] All patents, patent applications, publications, and descriptions mentioned herein are incorporated by reference in their entirety for all purposes. None is admitted to be prior art.

Claims

WHAT IS CLAIMED IS:
1. A method for processing access request messages through a data security hub, the method comprising:
creating a topological graph based on the interaction data for a plurality of previous access requests, the topological graph including nodes and edges connecting the nodes;
determining a plurality of communities from the topological graph to form a predictive model, each community of the plurality of communities including a subset of the nodes;
receiving, from a client device, an access request;
determining request interaction data for the access request; determining a predicted subset of the plurality of communities by applying the request interaction data to the predictive model;
initiating an authentication process for the access request using the predicted subset of the plurality of communities; and
providing, to the client device, an authentication response based on the authentication process for the access request.
2. The method of claim 1 , further comprising:
obtaining client authentication information from the access request;
determining a subset of registered authentication information corresponding to account identifiers included in the predicted subset of the plurality of communities, wherein the authentication process includes comparing the client authentication information to the subset of registered authentication information; and
determining an authentication outcome based on the comparing of the client authentication information to the subset of registered authentication information, wherein the authentication response is further based on the authentication outcome.
3. The method of claim 2, wherein the client authentication information includes biometric data of a user of the client device, wherein the subset of registered authentication information includes biometric data of registered users.
4. The method of claim 2, further comprising: determining that the client authentication information is not registered based on the comparing of the client authentication information to the subset of registered authentication information; and
initiating a registration process based on the determining that the client authentication information is not registered.
5. The method of claim 1 , further comprising:
obtaining an account identifier from the access request;
determining an account interaction profile based on previous access requests associated the account identifier;
determining an expected subset of the plurality of communities based on the account interaction profile; and
determining a risk score based on the comparing the expected subset of the plurality of communities to the predicted subset of the plurality of communities, wherein the authentication response is further based on the risk score.
6. The method of claim 5, wherein the determining of the request interaction data for the access request includes determining a network address of the client device used for sending the access request, wherein the interaction data for the plurality of previous access requests includes network addresses used for sending the plurality of previous access requests, and wherein the predictive model is based on the network addresses used for sending the plurality of previous access requests.
7. The method of claim 5, further comprising:
determining a vector for each of the plurality of communities based on nodes and edges of the topological graph corresponding to each respective community; and
determining an expected vector based on the account interaction profile; and determining a vector distance between the expected vector and the vector representing each of the plurality of communities, wherein the determining of the expected subset of the plurality of communities is based on the vector distance between the expected vector and the vector representing each of the plurality of communities.
8. The method of claim 5, further comprising:
restricting the client authentication information of the access request based on the risk score to obtain a restricted set of authentication information; sending the restricted set of authentication information to a data processing server; and
receiving a second authentication response from the data processing server, wherein the authentication response is further based on the second authentication response.
9. The method of claim 1 , further comprising:
determining an edge-weight for each of the edges in the topological graph; determining a node-weight of each of the nodes in the topological graph based on the edge-weights;
determining a maximum node-weight corresponding to each of a plurality of node types based on the node-weight of each of the nodes, each of the nodes being of a particular node type; and
normalizing the node-weight of each of the nodes in the topological graph based on the maximum node-weight corresponding to its particular node type, wherein the determining of the plurality of communities from the topological graph to form the predictive model is based on the normalizing of the node-weight of each of the nodes in the topological graph.
10. The method of claim 9, further comprising:
generating a queue including the nodes in decreasing order by node-weight; selecting a first seed node from the queue, the first seed node being a highest node-weighted node in the queue;
creating a first community including the first seed node;
calculating an interaction probability for each of a plurality of candidate nodes not included in the first community, the interaction probability being a probability of a candidate node interacting with a community node included in the first community;
determining a highest priority candidate node based on the interaction probability for each of the plurality of candidate nodes not included in the first community;
determining whether the highest priority candidate node meets predefined criteria;
adding the highest priority candidate node to the first community based on the determining of whether the highest priority candidate node meets predefined criteria;
determining a next highest priority candidate node based on the interaction probability for each of the plurality of candidate nodes not included in the first community;
determining that the next highest priority candidate node does not meet the predefined criteria; outputting the first community, the first community comprising a unique community4dentifier for the first community and node4dentifiers for a set of nodes included in the first community;
removing, from the queue, the set of nodes included in the first community; selecting a second seed node from the queue, the queue not including the set of nodes included in the first community; and
creating a second community including the second seed node, wherein the plurality of communities includes the first community and the second community.
1 1. An data security hub for processing access request messages, the data security hub comprising:
one or more processors; and
a computer readable storage medium coupled to the one or more processors storing a plurality of instructions that, when executed by the one or more processors, cause the one or more processors to:
store interaction data for a plurality of previous access requests; create a topological graph based on the interaction data, the topological graph including nodes and edges connecting the nodes;
determine a plurality of communities from the topological graph to form a predictive model, each community of the plurality of communities including a subset of the nodes;
receive, from a client device, an access request;
determine request interaction data for the access request;
determine a predicted subset of the plurality of communities by applying the request interaction data to the predictive model;
initiate an authentication process for the access request using the predicted subset of the plurality of communities; and
provide, to the client device, an authentication response based on the authentication process for the access request.
12. The data security hub of claim 1 1 , wherein the plurality of instructions, when executed by the one or more processors, further cause the one or more processors to:
obtain client authentication information from the access request; determine a subset of registered authentication information corresponding to account identifiers included in the predicted subset of the plurality of communities, wherein the authentication process includes comparing the client authentication information to the subset of registered authentication information; and
determine an authentication outcome based on the comparing of the client authentication information to the subset of registered authentication information, wherein the authentication response is further based on the authentication outcome.
13. The data security hub of claim 12, wherein the client authentication information includes biometric data of a user of the client device, wherein the subset of registered authentication information includes biometric data of registered users.
14. The data security hub of claim 12, wherein the plurality of instructions, when executed by the one or more processors, further cause the one or more processors to:
determine that the client authentication information is not registered based on the comparing of the client authentication information to the subset of registered
authentication information; and
initiate a registration process based on the determining that the client authentication information is not registered.
15. The data security hub of claim 1 1 , wherein the plurality of instructions, when executed by the one or more processors, further cause the one or more processors to:
obtain an account identifier from the access request;
determine an account interaction profile based on previous access requests associated the account identifier;
determine an expected subset of the plurality of communities based on the account interaction profile; and
determine a risk score based on the comparing the expected subset of the plurality of communities to the predicted subset of the plurality of communities, wherein the authentication response is further based on the risk score.
16. The data security hub of claim 15, wherein the determination of the request interaction data for the access request includes determining a network address of the client device used for sending the access request, wherein the interaction data for the plurality of previous access requests includes network addresses used for sending the plurality of previous access requests, and wherein the predictive model is based on the network addresses used for sending the plurality of previous access requests.
17. The data security hub of claim 15, wherein the plurality of instructions, when executed by the one or more processors, further cause the one or more processors to:
determine a vector for each of the plurality of communities based on nodes and edges of the topological graph corresponding to each respective community; and
determine an expected vector based on the account interaction profile; and determine a vector distance between the expected vector and the vector representing each of the plurality of communities, wherein the determination of the expected subset of the plurality of communities is based on the vector distance between the expected vector and the vector representing each of the plurality of communities.
18. The data security hub of claim 15, wherein the plurality of instructions, when executed by the one or more processors, further cause the one or more processors to:
restrict the client authentication information of the access request based on the risk score to obtain a restricted set of authentication information;
send the restricted set of authentication information to a data processing server; and
receive a second authentication response from the data processing server, wherein the authentication response is further based on the second authentication response.
19. The data security hub of claim 1 1 , wherein the plurality of instructions, when executed by the one or more processors, further cause the one or more processors to:
determine an edge-weight for each of the edges in the topological graph; determine a node-weight of each of the nodes in the topological graph based on the edge-weights;
determine a maximum node-weight corresponding to each of a plurality of node types based on the node-weight of each of the nodes, each of the nodes being of a particular node type; and
normalize the node-weight of each of the nodes in the topological graph based on the maximum node-weight corresponding to its particular node type, wherein the determination of the plurality of communities from the topological graph to form the predictive model is based on the normalizing of the node-weight of each of the nodes in the topological graph.
20. The data security hub of claim 19, wherein the plurality of instructions, when executed by the one or more processors, further cause the one or more processors to:
generate a queue including the nodes in decreasing order by node-weight; select a first seed node from the queue, the first seed node being a highest node-weighted node in the queue;
create a first community including the first seed node;
calculate an interaction probability for each of a plurality of candidate nodes not included in the first community, the interaction probability being a probability of a candidate node interacting with a community node included in the first community;
determine a highest priority candidate node based on the interaction probability for each of the plurality of candidate nodes not included in the first community;
determine whether the highest priority candidate node meets predefined criteria;
add the highest priority candidate node to the first community based on the determining of whether the highest priority candidate node meets predefined criteria;
determine a next highest priority candidate node based on the interaction probability for each of the plurality of candidate nodes not included in the first community;
determine that the next highest priority candidate node does not meet the predefined criteria;
output the first community, the first community comprising a unique community-identifier for the first community and node-identifiers for a set of nodes included in the first community;
remove, from the queue, the set of nodes included in the first community; select a second seed node from the queue, the queue not including the set of nodes included in the first community; and
create a second community including the second seed node, wherein the plurality of communities includes the first community and the second community.
PCT/US2018/014550 2018-01-19 2018-01-19 Data security using graph communities WO2019143360A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2018/014550 WO2019143360A1 (en) 2018-01-19 2018-01-19 Data security using graph communities

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2018/014550 WO2019143360A1 (en) 2018-01-19 2018-01-19 Data security using graph communities

Publications (1)

Publication Number Publication Date
WO2019143360A1 true WO2019143360A1 (en) 2019-07-25

Family

ID=67301055

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/014550 WO2019143360A1 (en) 2018-01-19 2018-01-19 Data security using graph communities

Country Status (1)

Country Link
WO (1) WO2019143360A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559310A (en) * 2021-02-25 2021-03-26 北京芯盾时代科技有限公司 Object identification method and device based on dynamic graph
US20230015732A1 (en) * 2021-07-16 2023-01-19 Sony Interactive Entertainment Inc. Head-mountable display systems and methods

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120197758A1 (en) * 2011-01-27 2012-08-02 Ebay Inc. Computation of user reputation based on transaction graph
US20160044720A1 (en) * 2012-12-21 2016-02-11 Orange A method and device to connect to a wireless network
US20160364794A1 (en) * 2015-06-09 2016-12-15 International Business Machines Corporation Scoring transactional fraud using features of transaction payment relationship graphs
US20170331828A1 (en) * 2016-05-13 2017-11-16 Idm Global, Inc. Systems and methods to authenticate users and/or control access made by users on a computer network using identity services
WO2018013566A1 (en) * 2016-07-11 2018-01-18 Visa International Service Association Machine learning and prediction using graph communities

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120197758A1 (en) * 2011-01-27 2012-08-02 Ebay Inc. Computation of user reputation based on transaction graph
US20160044720A1 (en) * 2012-12-21 2016-02-11 Orange A method and device to connect to a wireless network
US20160364794A1 (en) * 2015-06-09 2016-12-15 International Business Machines Corporation Scoring transactional fraud using features of transaction payment relationship graphs
US20170331828A1 (en) * 2016-05-13 2017-11-16 Idm Global, Inc. Systems and methods to authenticate users and/or control access made by users on a computer network using identity services
WO2018013566A1 (en) * 2016-07-11 2018-01-18 Visa International Service Association Machine learning and prediction using graph communities

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559310A (en) * 2021-02-25 2021-03-26 北京芯盾时代科技有限公司 Object identification method and device based on dynamic graph
CN112559310B (en) * 2021-02-25 2021-06-08 北京芯盾时代科技有限公司 Object identification method and device based on dynamic graph
US20230015732A1 (en) * 2021-07-16 2023-01-19 Sony Interactive Entertainment Inc. Head-mountable display systems and methods

Similar Documents

Publication Publication Date Title
US20220358242A1 (en) Data security hub
US20230013306A1 (en) Sensitive Data Classification
US11797657B1 (en) Behavioral profiling method and system to authenticate a user
US20210027182A1 (en) Automated machine learning systems and methods
US9185095B1 (en) Behavioral profiling method and system to authenticate a user
US20200285898A1 (en) Systems and methods for training a data classification model
US10891631B2 (en) Framework for generating risk evaluation models
US10609087B2 (en) Systems and methods for generation and selection of access rules
US11823197B2 (en) Authenticating based on user behavioral transaction patterns
US11853110B2 (en) Auto-tuning of rule weights in profiles
US20220309507A1 (en) Machine learning system for automated recommendations of evidence during dispute resolution
WO2023278714A1 (en) Authenticating based on behavioral transaction patterns
WO2019143360A1 (en) Data security using graph communities
US11947643B2 (en) Fraud detection system, fraud detection method, and program
US20240086701A1 (en) Neighborhood-specific loss for correcting score distribution distortion
US11544715B2 (en) Self learning machine learning transaction scores adjustment via normalization thereof accounting for underlying transaction score bases
US20220309359A1 (en) Adverse features neutralization in machine learning
US11954218B2 (en) Real-time access rules using aggregation of periodic historical outcomes
US11886403B1 (en) Apparatus and method for data discrepancy identification
US11270230B1 (en) Self learning machine learning transaction scores adjustment via normalization thereof
US11842314B1 (en) Apparatus for a smart activity assignment for a user and a creator and method of use
US11935060B1 (en) Systems and methods based on anonymized data
EP4310755A1 (en) Self learning machine learning transaction scores adjustment via normalization thereof
US20210248258A1 (en) Real-time access rules using aggregation of periodic historical outcomes
WO2023096637A1 (en) Automatic model onboarding and searching-based optimization

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18900833

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18900833

Country of ref document: EP

Kind code of ref document: A1