WO2025035116A1 - Controlling access using a risk indicator generated with alternative data - Google Patents

Controlling access using a risk indicator generated with alternative data Download PDF

Info

Publication number
WO2025035116A1
WO2025035116A1 PCT/US2024/041759 US2024041759W WO2025035116A1 WO 2025035116 A1 WO2025035116 A1 WO 2025035116A1 US 2024041759 W US2024041759 W US 2024041759W WO 2025035116 A1 WO2025035116 A1 WO 2025035116A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
risk indicator
target entity
machine learning
alternative data
Prior art date
Application number
PCT/US2024/041759
Other languages
French (fr)
Inventor
Joseph White
Felipe Alfonso Avila ROSALES
Lewis Jordan
Joji VARUGHESE
Matthew Turner
Howard HAMILTON
Jiawei Liu
Original Assignee
Equifax Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Equifax Inc. filed Critical Equifax Inc.
Publication of WO2025035116A1 publication Critical patent/WO2025035116A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present disclosure relates generally to artificial intelligence for risk prediction. More specifically, but not by way of limitation, this disclosure relates to controlling access to secure resources based on a risk assessment generated using a model trained on multi -data attributes.
  • a method includes one or more processing devices performing operations including accessing a machine learning model to determine a final risk indicator of a target entity from a baseline data associated with the target entity and an alternative data associated with the target entity.
  • the method can also include generating the final risk indicator of the target entity using the trained machine learning model and the baseline data associated with the target entity and the alternative data associated with the target entity.
  • the method can further include transmitting, to a remote computing device, a responsive message comprising at least the final risk indicator for use in controlling access of the target entity to one or more computing environments.
  • a system in another example, includes a processing device; and a memory device in which instructions executable by the processing device are stored for causing the processing device to perform various operations.
  • the system can train a machine learning model to determine a final risk indicator for a target entity from a baseline data and an alternative data associated with the target entity.
  • the system can generate the final risk indicator of the target entity using the trained machine learning model and the baseline data associated with the target entity and the alternative data associated with the target entity.
  • the system can further transmit, to a remote computing device, a responsive message comprising at least the final risk indicator for use in controlling access of the target entity to one or more computing environments.
  • a non-transitory computer-readable storage medium has program code that is executable by a processor to cause a computing device to perform operations.
  • the operations can include training a machine learning model to determine a final risk indicator for a target entity from a baseline data and an alternative data associated with the target entity.
  • the operations can include generating the final risk indicator of the target entity using the trained machine learning model and the baseline data associated with the target entity and the alternative data associated with the target entity.
  • the operations can further include transmitting, to a remote computing device, a responsive message comprising at least the final risk indicator for use in controlling access of the target entity to one or more computing environments.
  • FIG. 1 is a block diagram depicting an example of an operating environment according to certain aspects of the present disclosure.
  • FIG. 2 is a flow chart depicting example processes for generating a risk indicator according to certain aspects of the present disclosure.
  • FIG. 3 is a graph comparing exemplary Kolmogorov-Smirnov (KS) statistics for the example processes according to certain aspects of the present disclosure.
  • FIG. 4 is a graph comparing results of the example processes for thin file entities according to certain aspects of the present disclosure.
  • FIG. 5 is graph illustrating score alignment of a risk indicator according to certain aspects of the present disclosure.
  • FIG. 6 is histogram illustrating a change in risk indicators with and without the use of alternative data according to certain aspects of the present disclosure.
  • FIG. 7 is a flow diagram depicting an example of a process for generating a risk indicator according to certain aspects of the present disclosure.
  • FIG. 8 is a block diagram depicting an example of a computing device suitable for implementing aspects of the techniques and technologies presented herein.
  • Certain aspects and features of the present disclosure are directed to controlling access to secure resources based on a risk assessment generated using a model trained on multi-data attributes.
  • a risk indicator (such as a credit score) can be used as a barometer for entity risk or entity trustworthiness.
  • a risk indicator cannot be generated for an entity because an entity can be invisible (e.g., the entity is not associated with any data indicative of risk) or the entity can have a thin fde (e.g., the entity is associated with fewer than three accounts from which data can be drawn).
  • an entity may not have sufficient time of activity on an account from which data indicative of risk can be drawn.
  • invisible or thin file entities may be associated with other accounts that are not typically considered when generating a risk indicator.
  • Systems and methods disclosed herein can leverage alternative data associated with these other accounts to generate a risk indicator for invisible or thin file entities.
  • an invisible or thin file consumer may be associated with alternative data that is not typically captured during a credit report for credit scoring purposes.
  • Systems and methods described herein can leverage alternative data to generate risk indicators for these entities.
  • systems and methods described herein can facilitate risk decisions on entities for which typical credit scoring methods do not provide insights. This can give access to financial instruments to entities who would otherwise not have access to these particular instruments or services due to lack of data.
  • a risk assessment system can use alternative data as attributes computed on each data asset separately or as multi-data attributes.
  • systems can calculate specific attributes using alternative entity data and then combining these alternative data with existing data to create attributes by: 1) creating new attributes alongside existing attributes; 2) generating new attributes from the combined data (e.g., embedding a predictive risk indicator with the new attributes); or 3) generating a new risk indicator using the attributes and then combining this indicator with an existing indicator to create a fused risk indicator.
  • Certain aspects described herein provide improvements to machine learning techniques for assessing risks, for example, in access control associated with entities.
  • systems and methods described herein leverage three models: a traditional risk assessment model; a fusion risk assessment model; and an embedded risk assessment model.
  • These three approaches can provide improvements over traditional risk assessment systems. For example, these approaches can yield higher performing risk indicators that have greater accuracy than traditional risk indicators, and that facilitate risk predictions for entities for whom a risk indicator could not previously be generated.
  • the system can also transmit the risk indicator for the target entity to a remote computing system. In some examples, this may be the system from which the risk indicator was requested.
  • the risk indicator can be used to control interactions of the target entity with an interactive computing environment.
  • the risk indicator can be included in a responsive message to a request for evaluating the target entity such that the responsive message can be used to allow, challenge, or deny some operation to the target entity. For example, if the risk indicator is below a predefined threshold, an interaction by the target with the interactive computing environment may be automatically denied or flagged for manual review.
  • certain aspects described herein which can include generating one or more risk indicators associated with target entities and providing a responsive message using the risk indicator, can improve at least the technical fields of controlling interactions between computing environments, access control for a computing environment, or a combination thereof. Further, the risk assessment computing system leverages distinctive components of the risk indicator to create a robust and easily implemented framework.
  • FIG. 1 is a block diagram depicting an example of an operating environment 100 in which a risk assessment computing system 130 builds and trains a risk assessment model 120 that can be trained to predict risk indicators based on training data, which includes multi-data attributes.
  • FIG. 1 depicts examples of hardware components of a risk assessment computing system 130, according to some aspects.
  • the risk assessment computing system 130 is a specialized computing system that may be used for processing large amounts of data using a large number of computer processing cycles.
  • the risk assessment computing system 130 can include a model training server 110 for building and training a standard data model 119 (i.e., a baseline data model) used to predict a baseline data risk indicator associated with an entity accessing controlled resources and an alternative data model 121 used to predict alternative data risk indicators associated with an entity based on alternative data from alternative data source 123.
  • the model training server 110 can also be used for building and training a risk assessment model 120 for combining the baseline data risk indicator and the alternative data indicator into a final risk indicator.
  • the risk assessment computing system 130 can further include a risk assessment server 118 for performing a risk assessment for given predictor variables 124, or features, using the trained risk assessment model 120, the trained alternative data model 121, and the trained standard data model 119.
  • the model training server 110 can include one or more processing devices that execute program code, such as a model training application 112.
  • the program code is stored on a non-transitory computer-readable medium.
  • the model training application 112 can execute one or more processes or applications to develop, train, and optimize a standard data model 119 (i.e., a baseline data model 119) for predicting the baseline risk indicator based on the predictor variables 124 or other data stored in the risk data repository 122. Additionally, the model training application 112 can execute one or more processes or applications to develop, train, and optimize an alternative data model 121 for predicting alternative data indicators based on the data from the alternative data source 123.
  • the model training application 112 can execute one or more processes or applications to develop, train, and optimize a risk assessment model 120 for predicting final risk indicators by combining the baseline data risk indicator output by the standard data model 119 and the alternative data indicator output by alternative data model 121.
  • the model training application 112 can build and train a risk assessment model 120 using risk assessment training data 126 in a training process.
  • the risk assessment training data 126 can include multiple training vectors including training predictor variables and training risk indicator outputs corresponding to the training vectors.
  • the risk assessment training data 126 may include differing subsets of data sources.
  • the alternative data model 121 can be trained to predict an alternative data risk indicator using data from the alternative data source 123. The alternative data risk indicator can then be combined with the output of the standard data model 119 (e.g., the baseline data risk indicator) to generate a final risk indicator.
  • the final risk indicator can improve the accuracy of access decisions from the risk assessment computing system 130, particularly for entities associated with low amounts of data in the data repository 122.
  • the risk assessment training data 126 can be stored in one or more network-attached storage units on which various repositories, databases, or other structures are stored. An example of these data structures is the risk data repository 122.
  • Network-attached storage units can include the risk data repository 122.
  • Network- attached storage units may store a variety of different types of data organized in a variety of different ways and from a variety of different sources.
  • the network-attached storage unit may include storage other than primary storage located within the model training server 110 that is directly accessible by processors located therein.
  • the network-attached storage unit may include secondary, tertiary, or auxiliary storage, such as large hard drives, servers, virtual memory, among other types.
  • Storage devices may include portable or non-portable storage devices, optical storage devices, and various other mediums capable of storing and containing data.
  • a machine -readable storage medium or computer- readable storage medium may include a non-transitory medium in which data can be stored and that does not include carrier waves or transitory electronic signals.
  • Examples of a non- transitory medium may include, for example, a magnetic disk or tape, optical storage media such as a compact disk or digital versatile disk, flash memory, memory, or memory devices.
  • the risk assessment server 118 can include one or more processing devices that execute program code, such as a risk assessment application 114.
  • the program code is stored on a non-transitory computer-readable medium.
  • the risk assessment application 114 can execute one or more processes to use the risk assessment model 120, the standard data model 119, and/or the alternative data model 121 trained during execution of the model training application 112 to predict risk indicators based on input predictor variables 124.
  • the risk indicators can be used to protect or allocate computing resources of the risk assessment computing system 130.
  • the risk assessment computing system 130 can communicate with various other computing systems, such as client computing systems 104.
  • client computing systems 104 may send risk assessment queries to the risk assessment server 118 for risk assessment or may send signals to the risk assessment server 118 that control or otherwise influence different aspects of the risk assessment computing system 130.
  • the client computing systems 104 may also interact with user computing systems 106 via one or more public data networks 108 to facilitate interactions between users of the user computing systems 106 and interactive computing environments provided by the client computing systems 104.
  • Each client computing system 104 may include one or more third-party devices, such as individual servers or groups of servers operating in a distributed manner.
  • a client computing system 104 can include any computing device or group of computing devices operated by a seller, lender, or other providers of products or services.
  • the client computing system 104 can include one or more server devices.
  • the one or more server devices can include or can otherwise access one or more non-transitory computer-readable media.
  • the client computing system 104 can also execute instructions that provide an interactive computing environment accessible to user computing systems 106. Examples of the interactive computing environment include a mobile application specific to a particular client computing system 104, a web-based application accessible via a mobile device, etc.
  • the executable instructions are stored in one or more non-transitory computer-readable media.
  • the client computing system 104 can further include one or more processing devices that are capable of providing the interactive computing environment to perform operations described herein.
  • the interactive computing environment can include executable instructions stored in one or more non-transitory computer-readable media.
  • the instructions providing the interactive computing environment can configure one or more processing devices to perform operations described herein.
  • the executable instructions for the interactive computing environment can include instructions that provide one or more graphical interfaces.
  • the graphical interfaces are used by a user computing system 106 to access various functions of the interactive computing environment. For instance, the interactive computing environment may transmit data to and receive data from a user computing system 106 to shift between different states of the interactive computing environment, where the different states allow one or more electronics transactions between the user computing system 106 and the client computing system 104 to be performed.
  • a client computing system 104 may have other computing resources associated therewith (not shown in FIG. 1), such as server computers hosting and managing virtual machine instances for providing cloud computing services, server computers hosting and managing online storage resources for users, server computers for providing database services, and others.
  • the interaction between the user computing system 106 and the client computing system 104 may be performed through graphical user interfaces presented by the client computing system 104 to the user computing system 106, or through application programming interface (API) calls or web service calls.
  • API application programming interface
  • a user computing system 106 can include any computing device or other communication device operated by an entity, such as a user, an organization, or a company.
  • the user computing system 106 can include one or more computing devices, such as laptops, smartphones, and other personal computing devices.
  • a user computing system 106 can include executable instructions stored in one or more non-transitory computer-readable media.
  • the user computing system 106 can also include one or more processing devices that are capable of executing program code to perform operations described herein.
  • the user computing system 106 can allow a user to access certain online services from a client computing system 104 or other computing resources, to engage in mobile commerce with a client computing system 104, to obtain controlled access to electronic content hosted by the client computing system 104, etc.
  • the user can use the user computing system 106 to engage in an electronic transaction with a client computing system 104 via an interactive computing environment.
  • An electronic transaction between the user computing system 106 and the client computing system 104 can include, for example, the user computing system 106 being used to request online storage resources managed by the client computing system 104, acquire cloud computing resources (e.g., virtual machine instances), and so on.
  • cloud computing resources e.g., virtual machine instances
  • An electronic transaction between the user computing system 106 and the client computing system 104 can also include, for example, querying a set of sensitive or other controlled data, accessing online financial services provided via the interactive computing environment, submitting an online credit card application or other digital application to the client computing system 104 via the interactive computing environment, operating an electronic tool within an interactive computing environment hosted by the client computing system (e.g., a content-modification feature, an application-processing feature, etc.).
  • querying a set of sensitive or other controlled data accessing online financial services provided via the interactive computing environment, submitting an online credit card application or other digital application to the client computing system 104 via the interactive computing environment, operating an electronic tool within an interactive computing environment hosted by the client computing system (e.g., a content-modification feature, an application-processing feature, etc.).
  • an interactive computing environment implemented through a client computing system 104 can be used to provide access to various online functions.
  • a website or other interactive computing environment provided by an online resource provider can include electronic functions for requesting computing resources, online storage resources, network resources, database resources, or other types of resources.
  • a website or other interactive computing environment provided by a financial institution can include electronic functions for obtaining one or more financial services, such as loan application and management tools, credit card application and transaction management workflows, electronic fund transfers, etc.
  • a user computing system 106 can be used to request access to the interactive computing environment provided by the client computing system 104, which can selectively grant or deny access to various electronic functions.
  • the client computing system 104 can collect data associated with the user and communicate with the risk assessment server 118 for risk assessment. Based on the risk indicator predicted by the risk assessment server 118, the client computing system 104 can determine whether to grant the access request of the user computing system 106 to certain features of the interactive computing environment.
  • the system depicted in FIG. 1 can train the standard data model 119 to determine baseline data risk indicators, such as credit scores, using predictor variables 124, or data stored in the risk data repository 122.
  • the system depicted in FIG. 1 can train the alternative data model 121 based on alternative data from the alternative data source 123 to generate an alternative data risk indicator based on alternative data associated with the target entity.
  • the risk assessment model 120 can be trained on data stored in the risk data repository 122 to combine the baseline data risk indicator and the alternative data indicator into a final risk indicator.
  • a predictor variable 124 can be any variable predictive of risk that is associated with an entity. Any suitable predictor variable that is authorized for use by an appropriate legal or regulatory framework may be used.
  • Alternative data can include, for example, data not typically stored in the risk data repository 122 or data not typically stored by the risk data repository 122.
  • alternative data can include rental payments, payday loan payments, rent-to-own payments, utility use behavior, and utility bill payments.
  • Examples of predictor variables 124 used for predicting the risk associated with an entity accessing online resources include, but are not limited to, variables indicating the demographic characteristics of the entity (e.g., name of the entity, the network or physical address of the company, the identification of the company, the revenue of the company), variables indicative of prior actions or transactions involving the entity (e.g., past requests of online resources submitted by the entity, the amount of online resource currently held by the entity, and so on.), variables indicative of one or more behavioral traits of an entity (e.g., the timeliness of the entity releasing the online resources), etc.
  • variables indicating the demographic characteristics of the entity e.g., name of the entity, the network or physical address of the company, the identification of the company, the revenue of the company
  • variables indicative of prior actions or transactions involving the entity e.g., past requests of online resources submitted by the entity, the amount of online resource currently held by the entity, and so on.
  • variables indicative of one or more behavioral traits of an entity e.g., the timeliness
  • examples of predictor variables 124 used for predicting the risk associated with an entity accessing services provided by a financial institute include, but are not limited to, variables indicative of one or more demographic characteristics of an entity (e.g., age, gender, income, etc.), variables indicative of prior actions or transactions involving the entity (e.g., information that can be obtained from credit files or records, financial records, consumer records, or other data about the activities or characteristics of the entity), variables indicative of one or more behavioral traits of an entity, etc.
  • An additional risk indicator (e.g., the combination of the risk indicator and alternative data indicator) can be used by the service provider (e.g., the service provider controlling the interactive computing environment) to determine the risk associated with the entity accessing a service provided by the service provider, thereby granting or denying access by the entity to an interactive computing environment implementing the service. For example, if the service provider determines that the final risk indicator is lower than a threshold risk indicator value, then the client computing system 104 associated with the service provider can generate or otherwise provide access permission to the user computing system 106 that requested the access.
  • the access permission can include, for example, cryptographic keys used to generate valid access credentials or decryption keys used to decrypt access credentials.
  • the client computing system 104 associated with the service provider can also allocate resources to the user and provide a dedicated web address for the allocated resources to the user computing system 106, for example, by adding it in the access permission. With the obtained access credentials and/or the dedicated web address, the user computing system 106 can establish a secure network connection to the computing environment hosted by the client computing system 104 and access the resources via invoking API calls, web service calls, HTTP requests, or other proper mechanisms.
  • Each communication within the operating environment 100 may occur over one or more data networks, such as a public data network 108, a network 116 such as a private data network, or some combination thereof.
  • a data network may include one or more of a variety of different types of networks, including a wireless network, a wired network, or a combination of a wired and wireless network. Examples of suitable networks include the Internet, a personal area network, a local area network (“LAN”), a wide area network (“WAN”), or a wireless local area network (“WLAN”).
  • a wireless network may include a wireless interface or a combination of wireless interfaces.
  • a wired network may include a wired interface. The wired or wireless networks may be implemented using routers, access points, bridges, gateways, or the like, to connect devices in the data network.
  • FIG. 1 The number of devices depicted in FIG. 1 is provided for illustrative purposes. Different numbers of devices may be used. For example, while certain devices or systems are shown as single devices in FIG. 1, multiple devices may instead be used to implement these devices or systems. Similarly, devices or systems that are shown as separate, such as the model training server 110 and the risk assessment server 118, may be instead implemented in a single device or system.
  • FIG. 2 illustrates process diagrams for the three methods of including alternative data in a risk indicator.
  • Each of the processes e.g., fusion model 202, embedded model 204, and multi-data attributes model 206
  • fusion model 202 can be implemented by one or more components of the risk assessment computing system 130 as described with reference to FIG. 1.
  • data attributes 208 can be used to generate a standard data risk indicator 210 (i.e., a baseline data risk indicator) for a particular entity. That is, data attributes 208 can be stored in the risk data repository 122. These data attributes 208 can be used to train the standard data model 119 to generate the standard data risk indicator 210.
  • the standard data risk indicator 210 can indicate a predicted amount of risk associated with a particular entity.
  • the model training application 112 can also access alternative data atributes 212 from the alternative data source 123. These alternative data atributes 212 can be used to train the alternative data model 121 to generate the alternative risk indicator 214.
  • the trained risk assessment model 120 can be used to combine the standard data risk indicator 210 with the alternative risk indicator 214.
  • data atributes 208 can be used to train the standard data model 119 to output a standard data risk indicator 210.
  • the alternative data atributes 212 can be directly fed into the risk assessment model 120 to be used in conjunction with the risk indicator to generate the final risk indicator 216.
  • the alternative data atributes 212 are combined with the risk indicator 210 into a single model (e.g., the risk assessment model 120).
  • the alternative data can be used to complement the predictive power of the standard data machine learning model 119.
  • multi-data atributes model 206 multi-data atributes 218 can be computed by aligning data 220 from the risk data repository 122 with alternative data 222 from the alternative data source 123. The multi-data atributes 218 can then be used to train the risk assessment model 120 to determine the final risk indicator 216. In this example, the multidata atributes model 206 can improve efficiency of providing a final risk indicator and can be used with or without the inclusion of the alternative data.
  • four models may be built to generate risk indicators using alternative data. These four models can include: a credit-only model, a fusion model, an embedded model, and a multi-data atribute model that leverages the purposed view.
  • a credit-only model e.g., a credit-only model
  • a fusion model e.g., a fusion model
  • an embedded model e.g., a multi-data atribute model that leverages the purposed view.
  • all three alternative approaches may provide lift over the baseline credit model (e.g., over the baseline risk assessment model).
  • FIG. 3 is a comparison of the results of a Kolmogorov-Smirnov (KS) statistics test for the above-listed approaches to incorporating alternative data into a risk indicator or risk score, such as a credit score.
  • graph 300 shows the KS test statistic performance of each model, including a model that leverages alternative consumer data only.
  • KS is a performance statistic used to measure the separation between score distributions on consumers who paid as agreed versus those that became delinquent.
  • all three approaches showed substantial improvements in all metrics.
  • the multi-data attributes model provided the most lift overall in all metrics. For example, for the segment overall, multi-data attributes provided a 6.6 percent overall lift over a credit-only benchmark KS value of 46.6.
  • Graph 400 shown in FIG. 4, summarizes the KS and Gini shifts as well as the shifts in delinquent capture rates.
  • All three approaches provided a lift in delinquent capture rates through the bottom three deciles, with the multi-data attributes model providing the most lift in the bottom 30 percent of scores (P30).
  • the only exception was in the utility segment, where the embedded model approach provided the most lift (with multidata attributes model providing the least lift).
  • the multi-data attributes improved overall delinquent capture rates by 4.6 percent over the baseline credit model (representing 293 BPs of improvement).
  • FIG. 5 illustrates that each of the examined models is aligned, and thus policy changes do not need to be made to account for the inclusion of the alternative data in credit scores.
  • Graph 500 shows interval delinquency rates and demonstrates that the models are aligned. If similar scores yield significant differences in the delinquency rate, then risk indicator users would have to revise policies to modify thresholds for both acceptance and pricing bands. In the case of different thresholds for each model, the user experience would be negatively impacted. Thus, FIG. 5 shows that all multi-data models are aligned and, consequently, the user policies do not need to be altered for each model. Accordingly, disclosed systems and methods facilitate seamless inclusion of alternative data in risk assessment models.
  • risk indicator users may want to have the flexibility to turn on and turn off the use of the alternative data source. Two specific examples are when a risk indicator desires to know the impact that the alternative data has on the consumer's score or the risk indicator user may not want to pay for, acquire, or use the alternative data for certain consumer decisions.
  • FIG. 6 shows the change in scores with the alternative data and without the alternative data in graph 600.
  • Graph 600 shows the change in scores for the multi-data model with and without alternative consumer data.
  • the median change on a 1000-point scale is 6.1 indicating that using alternative consumer data accounts results in an increased score relative to using only credit accounts in the multi-data attributes model.
  • the three modeling techniques fusion, embedded, and multidata attributes
  • the multi -data attribute approach provided lift in terms of KS, Gini, and delinquent capture rates.
  • the embedded model approach showed the highest lift in the thin file consumer segment in terms of KS, Gini, and delinquent capture rates.
  • the fusion modeling technique may facilitate three product options (e.g., credit only model, alternative consumer data only model, and the fused model), the embedded model may facilitate two product options (credit only model and the embedded model), and the multidata model may also facilitate a plurality of product options (e.g., the multi -data attribute model using all available data sources, the multi-data attribute model using credit only data, and the multi -data attribute model using any other combination of two or more data sources).
  • the embedded model allows each alternative data attribute to be estimated simultaneously to complement predictive power already captured in the credit model. Consequently, in some examples, the embedded model may result in the alternative data being more impactful than the fusion model, depending on the application.
  • An additional advantage of the fusion model is that there are individual scores to waterfall back to if a particular score is unscorable or unavailable for a consumer. In this case, the models may need to be aligned to properly waterfall back to the individual scores.
  • An advantage of the embedded model is that the new data source (e.g., the source providing the alternative data) can complement the predictive power of the original risk assessment model.
  • FIG. 7 is a flow chart depicting an example of a process 700 for using a risk assessment model 120 to generate a final risk indicator for a target entity.
  • the risk assessment model 120 can generate the final risk indicator for a target entity based on predictor variables 124 associated with the target entity.
  • One or more computing devices e.g., the risk assessment server 118
  • suitable program code e.g., the risk assessment application 114.
  • the process 700 is described with reference to certain examples depicted in the figures. Other implementations, however, are possible. While the blocks of the process 700 are described in the temporal order below for illustrative purposes, it may be appreciated that the blocks can occur in any order and some blocks may occur simultaneously.
  • the process 700 involves accessing a risk assessment query associated with a target entity.
  • the risk assessment computing system 130 e.g., the risk assessment server 118
  • the risk assessment query may be received from the target entity requesting the risk assessment.
  • the risk assessment query may be received from a remote computing device associated with an entity authorized to request risk assessment of the target entity.
  • the risk assessment request can include alternative data or can request the inclusion of alternative data from the alternative data source 123 in the final risk indicator.
  • the process 700 involves training a machine learning model to determine a final risk indicator from a baseline data and an alternative data.
  • the risk assessment computing system 130 can train the machine learning model 120 to determine the final risk indicator based on a baseline data risk indicator determined by the standard data model 119 and an alternative data risk indicator determined by the alternative data model 121.
  • the machine learning model may determine the final risk indicator based on only the baseline data or only the alternative data dependent upon an indication that some of the data (e.g., either the baseline or alternative data) should be suppressed.
  • the process 700 can implement the fusion model 202 in which the standard data model 119 and the alternative data model 121 generate the baseline data risk indicator and the alternative risk indicator, respectively. Then, the machine learning model 120 can combine these risk indicators into the final risk indicator.
  • the process 700 can include implementing the embedded model 204. In the embedded model 204, the standard data model 119 can generate the baseline data risk indicator. In this example, the machine learning model 120 can receive the baseline data risk indicator and alternative data attributes as input and can output the final risk indicator.
  • the process 700 can include aligning standard data or predictor variables 124, stored by the risk data repository 122, with the alternative data stored by the alternative data source 123. Once the data is aligned, the risk assessment computing system 130 can generate multi-data attributes. These multi-data attributes can be used to train the machine learning model 120 to output the final risk indicator.
  • a credit only baseline data risk indicator can be delivered by suppressing or otherwise turning off the alternative data.
  • an alternative data only risk indicator may be delivered by suppressing or otherwise turning off the baseline data.
  • a multi-data risk indicator for the final risk indicator can be delivered by keeping both the baseline data and the alternative data active or otherwise turned on.
  • a single risk indicator can serve the purpose of baseline risk indicator, alternative data risk indicator, or final risk indicator.
  • the process 700 can include training the standard data model
  • the standard data model 119 can be trained on the predictor variables 124 stored in the risk data repository 122.
  • the alternative data model 121 can be trained on alternative data stored in the alternative data source 123.
  • the risk assessment computing system 130 can access the alternative data source 123 and use the alternative data to generate alternative data attributes for use as input to the machine learning model 120.
  • the process 700 involves generating the final risk indicator for the target entity.
  • the machine learning model 120 can receive as input a baseline data risk indicator associated with the target entity and an alternative data risk indicator associated with the target entity.
  • the machine learning model 120 can generate the final risk indicator for the target entity based on these respective risk indicators.
  • the machine learning model 120 can receive a baseline data risk indicator associated with the target entity (e.g., as determined by the trained standard data model 119) and alternative data attributes associated with the target entity from which to generate the final indicator.
  • the process 700 can involve accessing data and alternative data associated with the target entity, aligning the data, and generating multi-data attributes associated with the target entity. These multi-data attributes can be used as input to the machine learning model 120 to generate the final risk indicator for the target entity.
  • the process 700 includes transmitting, to a remote computing device, a responsive message comprising at least the final risk indicator for use in controlling access of the target entity to one or more computing environments.
  • the risk assessment server 118 can return a responsive message to the client computing system 104.
  • the responsive message can include at least the final risk indicator and explanatory data associated with the risk indicator.
  • the explanatory data can indicate relationships between changes in the risk indicator and changes in at least some of the predictor variables 124 associated with the target entity.
  • the responsive message can include the baseline data risk indicator and the alternative data risk indicator.
  • the final risk indicator (as well as the baseline data risk indicator and the alternative data risk indicator) can correspond to a level of risk associated with the target entity, for example with respect to accessing protecting computing resources.
  • the final risk indicator can be used for one or more operations that involve performing an operation with respect to the target entity based on a predicted risk associated with the target entity.
  • the final risk indicator can be utilized to control access to one or more interactive computing environments by the target entity.
  • the risk assessment computing system 130 can communicate with client computing systems 104, which may send risk assessment queries to the risk assessment server 118 to request risk assessment.
  • the client computing systems 104 may be associated with technological providers, such as cloud computing providers, online storage providers, or financial institutions such as banks, credit unions, credit-card companies, insurance companies, or other types of organizations.
  • the client computing systems 104 may be implemented to provide interactive computing environments for customers to access various services offered by these service providers.
  • Customers can use user computing systems 106 to access the interactive computing environments thereby accessing the services provided by these providers.
  • the client computing system 104 can determine whether to grant the customer access to the interactive computing environment. If the client computing system 104 determines that the level of risk associated with the customer accessing the interactive computing environment and the associated technical or financial service is too high, the client computing system 104 can deny access by the customer to the interactive computing environment. Conversely, if the client computing system 104 determines that the level of risk associated with the customer is acceptable, e.g., is below a predetermined threshold, the client computing system 104 can grant access to the interactive computing environment by the customer and the customer would be able to utilize the various services provided by the service providers.
  • the customer can utilize the user computing system 106 to access clouding computing resources, online storage resources, web pages or other user interfaces provided by the client computing system 104 to execute applications, store data, query data, submit an online digital application, operate electronic tools, or perform various other operations within the interactive computing environment hosted by the client computing system 104.
  • FIG. 8 is a block diagram depicting an example of a computing device 800, which can be used to implement the risk assessment server 118 or the model training server 110.
  • the computing device 800 can include various devices for communicating with other devices in the operating environment 100, as described with respect to FIG. 1.
  • the computing device 800 can include various devices for performing one or more transformation operations described above with respect to FIGS. 1-7.
  • the computing device 800 can include a processor 802 that is communicatively coupled to a memory 804.
  • the processor 802 executes computer-executable program code stored in the memory 804, accesses information stored in the memory 804, or both.
  • Program code may include machine -executable instructions that may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements.
  • a code segment may be coupled to another code segment or a hardware circuit by passing or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, among others.
  • Examples of a processor 802 include a microprocessor, an application-specific integrated circuit, a field-programmable gate array, or any other suitable processing device.
  • the processor 802 can include any number of processing devices, including one.
  • the processor 802 can include or communicate with a memory 804.
  • the memory 804 stores program code that, when executed by the processor 802, causes the processor to perform the operations described in this disclosure.
  • the memory 804 can include any suitable non-transitory computer-readable storage medium.
  • the computer-readable medium can include any electronic, optical, magnetic, or other storage device capable of providing a processor with computer-readable program code or other program code.
  • Non-limiting examples of a computer-readable medium include a magnetic disk, memory chip, optical storage, flash memory, storage class memory, ROM, RAM, an ASIC, magnetic storage, or any other medium from which a computer processor can read and execute program code.
  • the program code may include processorspecific program code generated by a compiler or an interpreter from code written in any suitable computer-programming language. Examples of suitable programming language include Hadoop, C, C++, C#, Visual Basic, Java, Python, Perl, JavaScript, ActionScript, etc.
  • the computing device 800 may also include a number of external or internal devices such as input or output devices.
  • the computing device 800 is shown with an input/output interface 808 that can receive input from input devices or provide output to output devices.
  • a bus 806 can also be included in the computing device 800. The bus 806 can communicatively couple one or more components of the computing device 800.
  • the computing device 800 can execute program code 814 that includes the risk assessment application 114 and/or the model training application 112.
  • the program code 814 for the risk assessment application 114 and/or the model training application 112 may be resident in any suitable computer-readable medium and may be executed on any suitable processing device.
  • the program code 814 for the risk assessment application 114 and/or the model training application 112 can reside in the memory 804 at the computing device 800 along with the program data 816 associated with the program code 814, such as the predictor variables 124 and/or the model training samples. Executing the risk assessment application 114 or the model training application 112 can configure the processor 802 to perform the operations described herein.
  • the computing device 800 can include one or more output devices.
  • One example of an output device is the network interface device 810 depicted in FIG. 8.
  • a network interface device 810 can include any device or group of devices suitable for establishing a wired or wireless data connection to one or more data networks described herein.
  • Non-limiting examples of the network interface device 810 include an Ethernet network adapter, a modem, etc.
  • a presentation device 812 can include any device or group of devices suitable for providing visual, auditory, or other suitable sensory output.
  • Non-limiting examples of the presentation device 812 include a touchscreen, a monitor, a speaker, a separate mobile computing device, etc.
  • the presentation device 812 can include a remote client-computing device that communicates with the computing device 800 using one or more data networks described herein. In other aspects, the presentation device 812 can be omitted.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

In some aspects, a computing system can train a machine learning model for risk assessment. For example, the system can access a trained machine learning model to determine a final risk indicator of a target entity from a baseline data associated with the target entity and an alternative data associated with the target entity. The computing system can generate the final risk indicator of the target entity using the baseline data associated with the target entity and the alternative data associated with the target entity. The computing system can also transmit, to a remote computing device, a responsive message comprising at least the final risk indicator for use in controlling access of the target entity to one or more computing environments.

Description

CONTROLLING ACCESS USING A RISK INDICATOR GENERATED WITH
ALTERNATIVE DATA
Cross-Reference to Related Application
[0001] This application claims the benefit of U.S. Provisional Patent Application No. 63/518,793, filed August 10, 2023, and entitled “CONTROLLING ACCESS USING A RISK INDICATOR GENERATED WITH ALTERNATIVE DATA,” the contents of which are hereby incorporated by reference in their entirety for all purposes.
Technical Field
[0002] The present disclosure relates generally to artificial intelligence for risk prediction. More specifically, but not by way of limitation, this disclosure relates to controlling access to secure resources based on a risk assessment generated using a model trained on multi -data attributes.
Background
[0003] In risk assessment, alternative consumer data and non-traditional financial data is increasingly being leveraged to increase inclusion of consumers in financial services. Alternative data can capture consumer behavior, while providing insights on credit-invisible, thin file, and young consumers. Data surrounding utility, mobile phone, and cable repayment history has been shown to be powerful in predicting behavior and risk associated with thin files and credit invisible consumers. But systems face challenges in applying this alternative data in generating risk predictions.
Summary
[0004] Various aspects of the present disclosure provide systems and methods for generating constraint compliant training data and constraint compliant machine learning models for use in risk assessment. In one example, a method includes one or more processing devices performing operations including accessing a machine learning model to determine a final risk indicator of a target entity from a baseline data associated with the target entity and an alternative data associated with the target entity. The method can also include generating the final risk indicator of the target entity using the trained machine learning model and the baseline data associated with the target entity and the alternative data associated with the target entity. The method can further include transmitting, to a remote computing device, a responsive message comprising at least the final risk indicator for use in controlling access of the target entity to one or more computing environments.
[0005] In another example, a system includes a processing device; and a memory device in which instructions executable by the processing device are stored for causing the processing device to perform various operations. The system can train a machine learning model to determine a final risk indicator for a target entity from a baseline data and an alternative data associated with the target entity. The system can generate the final risk indicator of the target entity using the trained machine learning model and the baseline data associated with the target entity and the alternative data associated with the target entity. The system can further transmit, to a remote computing device, a responsive message comprising at least the final risk indicator for use in controlling access of the target entity to one or more computing environments.
[0006] In yet another example, a non-transitory computer-readable storage medium has program code that is executable by a processor to cause a computing device to perform operations. The operations can include training a machine learning model to determine a final risk indicator for a target entity from a baseline data and an alternative data associated with the target entity. The operations can include generating the final risk indicator of the target entity using the trained machine learning model and the baseline data associated with the target entity and the alternative data associated with the target entity. The operations can further include transmitting, to a remote computing device, a responsive message comprising at least the final risk indicator for use in controlling access of the target entity to one or more computing environments.
[0007] This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used in isolation to determine the scope of the claimed subject matter. The subject matter should be understood by reference to appropriate portions of the entire specification, any or all drawings, and each claim.
[0008] The foregoing, together with other features and examples, will become more apparent upon referring to the following specification, claims, and accompanying drawings. Brief Description of the Drawings
[0009] FIG. 1 is a block diagram depicting an example of an operating environment according to certain aspects of the present disclosure.
[0010] FIG. 2 is a flow chart depicting example processes for generating a risk indicator according to certain aspects of the present disclosure.
[0011] FIG. 3 is a graph comparing exemplary Kolmogorov-Smirnov (KS) statistics for the example processes according to certain aspects of the present disclosure.
[0012] FIG. 4 is a graph comparing results of the example processes for thin file entities according to certain aspects of the present disclosure.
[0013] FIG. 5 is graph illustrating score alignment of a risk indicator according to certain aspects of the present disclosure.
[0014] FIG. 6 is histogram illustrating a change in risk indicators with and without the use of alternative data according to certain aspects of the present disclosure.
[0015] FIG. 7 is a flow diagram depicting an example of a process for generating a risk indicator according to certain aspects of the present disclosure.
[0016] FIG. 8 is a block diagram depicting an example of a computing device suitable for implementing aspects of the techniques and technologies presented herein.
Detailed Description
[0017] Certain aspects and features of the present disclosure are directed to controlling access to secure resources based on a risk assessment generated using a model trained on multi-data attributes. For example, a risk indicator (such as a credit score) can be used as a barometer for entity risk or entity trustworthiness. In some cases, a risk indicator cannot be generated for an entity because an entity can be invisible (e.g., the entity is not associated with any data indicative of risk) or the entity can have a thin fde (e.g., the entity is associated with fewer than three accounts from which data can be drawn). In another example, an entity may not have sufficient time of activity on an account from which data indicative of risk can be drawn. However, invisible or thin file entities may be associated with other accounts that are not typically considered when generating a risk indicator. Systems and methods disclosed herein can leverage alternative data associated with these other accounts to generate a risk indicator for invisible or thin file entities. [0018] As an example, in credit scoring, an invisible or thin file consumer may be associated with alternative data that is not typically captured during a credit report for credit scoring purposes. Systems and methods described herein can leverage alternative data to generate risk indicators for these entities. Thus, systems and methods described herein can facilitate risk decisions on entities for which typical credit scoring methods do not provide insights. This can give access to financial instruments to entities who would otherwise not have access to these particular instruments or services due to lack of data.
[0019] Systems and methods disclosed herein use multiple approaches to use alternative data to generate a risk indicator for a target entity. For example, a risk assessment system can use alternative data as attributes computed on each data asset separately or as multi-data attributes. In some examples, systems can calculate specific attributes using alternative entity data and then combining these alternative data with existing data to create attributes by: 1) creating new attributes alongside existing attributes; 2) generating new attributes from the combined data (e.g., embedding a predictive risk indicator with the new attributes); or 3) generating a new risk indicator using the attributes and then combining this indicator with an existing indicator to create a fused risk indicator.
[0020] Certain aspects described herein provide improvements to machine learning techniques for assessing risks, for example, in access control associated with entities. For example, systems and methods described herein leverage three models: a traditional risk assessment model; a fusion risk assessment model; and an embedded risk assessment model. These three approaches can provide improvements over traditional risk assessment systems. For example, these approaches can yield higher performing risk indicators that have greater accuracy than traditional risk indicators, and that facilitate risk predictions for entities for whom a risk indicator could not previously be generated.
[0021] The system can also transmit the risk indicator for the target entity to a remote computing system. In some examples, this may be the system from which the risk indicator was requested. The risk indicator can be used to control interactions of the target entity with an interactive computing environment. For example, the risk indicator can be included in a responsive message to a request for evaluating the target entity such that the responsive message can be used to allow, challenge, or deny some operation to the target entity. For example, if the risk indicator is below a predefined threshold, an interaction by the target with the interactive computing environment may be automatically denied or flagged for manual review. In this manner, certain aspects described herein, which can include generating one or more risk indicators associated with target entities and providing a responsive message using the risk indicator, can improve at least the technical fields of controlling interactions between computing environments, access control for a computing environment, or a combination thereof. Further, the risk assessment computing system leverages distinctive components of the risk indicator to create a robust and easily implemented framework.
[0022] These illustrative examples are given to introduce the reader to the general subject matter discussed here and are not intended to limit the scope of the disclosed concepts. The following sections describe various additional features and examples with reference to the drawings in which like numerals indicate like elements, and directional descriptions are used to describe the illustrative examples but, like the illustrative examples, should not be used to limit the present disclosure.
Operating Environment Example for Machine-Learning Operations
[0023] Referring now to the drawings, FIG. 1 is a block diagram depicting an example of an operating environment 100 in which a risk assessment computing system 130 builds and trains a risk assessment model 120 that can be trained to predict risk indicators based on training data, which includes multi-data attributes. FIG. 1 depicts examples of hardware components of a risk assessment computing system 130, according to some aspects. The risk assessment computing system 130 is a specialized computing system that may be used for processing large amounts of data using a large number of computer processing cycles. The risk assessment computing system 130 can include a model training server 110 for building and training a standard data model 119 (i.e., a baseline data model) used to predict a baseline data risk indicator associated with an entity accessing controlled resources and an alternative data model 121 used to predict alternative data risk indicators associated with an entity based on alternative data from alternative data source 123. The model training server 110 can also be used for building and training a risk assessment model 120 for combining the baseline data risk indicator and the alternative data indicator into a final risk indicator. The risk assessment computing system 130 can further include a risk assessment server 118 for performing a risk assessment for given predictor variables 124, or features, using the trained risk assessment model 120, the trained alternative data model 121, and the trained standard data model 119.
[0024] The model training server 110 can include one or more processing devices that execute program code, such as a model training application 112. The program code is stored on a non-transitory computer-readable medium. The model training application 112 can execute one or more processes or applications to develop, train, and optimize a standard data model 119 (i.e., a baseline data model 119) for predicting the baseline risk indicator based on the predictor variables 124 or other data stored in the risk data repository 122. Additionally, the model training application 112 can execute one or more processes or applications to develop, train, and optimize an alternative data model 121 for predicting alternative data indicators based on the data from the alternative data source 123. The model training application 112 can execute one or more processes or applications to develop, train, and optimize a risk assessment model 120 for predicting final risk indicators by combining the baseline data risk indicator output by the standard data model 119 and the alternative data indicator output by alternative data model 121.
[0025] In some aspects, the model training application 112 can build and train a risk assessment model 120 using risk assessment training data 126 in a training process. The risk assessment training data 126 can include multiple training vectors including training predictor variables and training risk indicator outputs corresponding to the training vectors. In some cases, the risk assessment training data 126 may include differing subsets of data sources. In some examples, the alternative data model 121 can be trained to predict an alternative data risk indicator using data from the alternative data source 123. The alternative data risk indicator can then be combined with the output of the standard data model 119 (e.g., the baseline data risk indicator) to generate a final risk indicator. The final risk indicator can improve the accuracy of access decisions from the risk assessment computing system 130, particularly for entities associated with low amounts of data in the data repository 122. The risk assessment training data 126 can be stored in one or more network-attached storage units on which various repositories, databases, or other structures are stored. An example of these data structures is the risk data repository 122.
[0026] Network-attached storage units can include the risk data repository 122. Network- attached storage units may store a variety of different types of data organized in a variety of different ways and from a variety of different sources. For example, the network-attached storage unit may include storage other than primary storage located within the model training server 110 that is directly accessible by processors located therein. In some aspects, the network-attached storage unit may include secondary, tertiary, or auxiliary storage, such as large hard drives, servers, virtual memory, among other types. Storage devices may include portable or non-portable storage devices, optical storage devices, and various other mediums capable of storing and containing data. A machine -readable storage medium or computer- readable storage medium may include a non-transitory medium in which data can be stored and that does not include carrier waves or transitory electronic signals. Examples of a non- transitory medium may include, for example, a magnetic disk or tape, optical storage media such as a compact disk or digital versatile disk, flash memory, memory, or memory devices.
[0027] The risk assessment server 118 can include one or more processing devices that execute program code, such as a risk assessment application 114. The program code is stored on a non-transitory computer-readable medium. The risk assessment application 114 can execute one or more processes to use the risk assessment model 120, the standard data model 119, and/or the alternative data model 121 trained during execution of the model training application 112 to predict risk indicators based on input predictor variables 124. The risk indicators can be used to protect or allocate computing resources of the risk assessment computing system 130.
[0028] Furthermore, the risk assessment computing system 130 can communicate with various other computing systems, such as client computing systems 104. For example, client computing systems 104 may send risk assessment queries to the risk assessment server 118 for risk assessment or may send signals to the risk assessment server 118 that control or otherwise influence different aspects of the risk assessment computing system 130. The client computing systems 104 may also interact with user computing systems 106 via one or more public data networks 108 to facilitate interactions between users of the user computing systems 106 and interactive computing environments provided by the client computing systems 104.
[0029] Each client computing system 104 may include one or more third-party devices, such as individual servers or groups of servers operating in a distributed manner. A client computing system 104 can include any computing device or group of computing devices operated by a seller, lender, or other providers of products or services. The client computing system 104 can include one or more server devices. The one or more server devices can include or can otherwise access one or more non-transitory computer-readable media. The client computing system 104 can also execute instructions that provide an interactive computing environment accessible to user computing systems 106. Examples of the interactive computing environment include a mobile application specific to a particular client computing system 104, a web-based application accessible via a mobile device, etc. The executable instructions are stored in one or more non-transitory computer-readable media.
[0030] The client computing system 104 can further include one or more processing devices that are capable of providing the interactive computing environment to perform operations described herein. The interactive computing environment can include executable instructions stored in one or more non-transitory computer-readable media. The instructions providing the interactive computing environment can configure one or more processing devices to perform operations described herein. In some aspects, the executable instructions for the interactive computing environment can include instructions that provide one or more graphical interfaces. The graphical interfaces are used by a user computing system 106 to access various functions of the interactive computing environment. For instance, the interactive computing environment may transmit data to and receive data from a user computing system 106 to shift between different states of the interactive computing environment, where the different states allow one or more electronics transactions between the user computing system 106 and the client computing system 104 to be performed.
[0031] In some examples, a client computing system 104 may have other computing resources associated therewith (not shown in FIG. 1), such as server computers hosting and managing virtual machine instances for providing cloud computing services, server computers hosting and managing online storage resources for users, server computers for providing database services, and others. The interaction between the user computing system 106 and the client computing system 104 may be performed through graphical user interfaces presented by the client computing system 104 to the user computing system 106, or through application programming interface (API) calls or web service calls.
[0032] A user computing system 106 can include any computing device or other communication device operated by an entity, such as a user, an organization, or a company. The user computing system 106 can include one or more computing devices, such as laptops, smartphones, and other personal computing devices. A user computing system 106 can include executable instructions stored in one or more non-transitory computer-readable media. The user computing system 106 can also include one or more processing devices that are capable of executing program code to perform operations described herein. In various examples, the user computing system 106 can allow a user to access certain online services from a client computing system 104 or other computing resources, to engage in mobile commerce with a client computing system 104, to obtain controlled access to electronic content hosted by the client computing system 104, etc.
[0033] For instance, the user can use the user computing system 106 to engage in an electronic transaction with a client computing system 104 via an interactive computing environment. An electronic transaction between the user computing system 106 and the client computing system 104 can include, for example, the user computing system 106 being used to request online storage resources managed by the client computing system 104, acquire cloud computing resources (e.g., virtual machine instances), and so on. An electronic transaction between the user computing system 106 and the client computing system 104 can also include, for example, querying a set of sensitive or other controlled data, accessing online financial services provided via the interactive computing environment, submitting an online credit card application or other digital application to the client computing system 104 via the interactive computing environment, operating an electronic tool within an interactive computing environment hosted by the client computing system (e.g., a content-modification feature, an application-processing feature, etc.).
[0034] In some aspects, an interactive computing environment implemented through a client computing system 104 can be used to provide access to various online functions. As a simplified example, a website or other interactive computing environment provided by an online resource provider can include electronic functions for requesting computing resources, online storage resources, network resources, database resources, or other types of resources. In another example, a website or other interactive computing environment provided by a financial institution can include electronic functions for obtaining one or more financial services, such as loan application and management tools, credit card application and transaction management workflows, electronic fund transfers, etc. A user computing system 106 can be used to request access to the interactive computing environment provided by the client computing system 104, which can selectively grant or deny access to various electronic functions. Based on the request, the client computing system 104 can collect data associated with the user and communicate with the risk assessment server 118 for risk assessment. Based on the risk indicator predicted by the risk assessment server 118, the client computing system 104 can determine whether to grant the access request of the user computing system 106 to certain features of the interactive computing environment.
[0035] In a simplified example, the system depicted in FIG. 1 can train the standard data model 119 to determine baseline data risk indicators, such as credit scores, using predictor variables 124, or data stored in the risk data repository 122. In additional or alternative examples, the system depicted in FIG. 1 can train the alternative data model 121 based on alternative data from the alternative data source 123 to generate an alternative data risk indicator based on alternative data associated with the target entity. The risk assessment model 120 can be trained on data stored in the risk data repository 122 to combine the baseline data risk indicator and the alternative data indicator into a final risk indicator. A predictor variable 124 can be any variable predictive of risk that is associated with an entity. Any suitable predictor variable that is authorized for use by an appropriate legal or regulatory framework may be used. Alternative data can include, for example, data not typically stored in the risk data repository 122 or data not typically stored by the risk data repository 122. For example, alternative data can include rental payments, payday loan payments, rent-to-own payments, utility use behavior, and utility bill payments.
[0036] Examples of predictor variables 124 used for predicting the risk associated with an entity accessing online resources include, but are not limited to, variables indicating the demographic characteristics of the entity (e.g., name of the entity, the network or physical address of the company, the identification of the company, the revenue of the company), variables indicative of prior actions or transactions involving the entity (e.g., past requests of online resources submitted by the entity, the amount of online resource currently held by the entity, and so on.), variables indicative of one or more behavioral traits of an entity (e.g., the timeliness of the entity releasing the online resources), etc. Similarly, examples of predictor variables 124 used for predicting the risk associated with an entity accessing services provided by a financial institute include, but are not limited to, variables indicative of one or more demographic characteristics of an entity (e.g., age, gender, income, etc.), variables indicative of prior actions or transactions involving the entity (e.g., information that can be obtained from credit files or records, financial records, consumer records, or other data about the activities or characteristics of the entity), variables indicative of one or more behavioral traits of an entity, etc.
[0037] An additional risk indicator (e.g., the combination of the risk indicator and alternative data indicator) can be used by the service provider (e.g., the service provider controlling the interactive computing environment) to determine the risk associated with the entity accessing a service provided by the service provider, thereby granting or denying access by the entity to an interactive computing environment implementing the service. For example, if the service provider determines that the final risk indicator is lower than a threshold risk indicator value, then the client computing system 104 associated with the service provider can generate or otherwise provide access permission to the user computing system 106 that requested the access. The access permission can include, for example, cryptographic keys used to generate valid access credentials or decryption keys used to decrypt access credentials. The client computing system 104 associated with the service provider can also allocate resources to the user and provide a dedicated web address for the allocated resources to the user computing system 106, for example, by adding it in the access permission. With the obtained access credentials and/or the dedicated web address, the user computing system 106 can establish a secure network connection to the computing environment hosted by the client computing system 104 and access the resources via invoking API calls, web service calls, HTTP requests, or other proper mechanisms.
[0038] Each communication within the operating environment 100 may occur over one or more data networks, such as a public data network 108, a network 116 such as a private data network, or some combination thereof. A data network may include one or more of a variety of different types of networks, including a wireless network, a wired network, or a combination of a wired and wireless network. Examples of suitable networks include the Internet, a personal area network, a local area network (“LAN”), a wide area network (“WAN”), or a wireless local area network (“WLAN”). A wireless network may include a wireless interface or a combination of wireless interfaces. A wired network may include a wired interface. The wired or wireless networks may be implemented using routers, access points, bridges, gateways, or the like, to connect devices in the data network.
[0039] The number of devices depicted in FIG. 1 is provided for illustrative purposes. Different numbers of devices may be used. For example, while certain devices or systems are shown as single devices in FIG. 1, multiple devices may instead be used to implement these devices or systems. Similarly, devices or systems that are shown as separate, such as the model training server 110 and the risk assessment server 118, may be instead implemented in a single device or system.
Example Approaches to Including Alternative Data in a Risk Indicator
[0040] FIG. 2 illustrates process diagrams for the three methods of including alternative data in a risk indicator. Each of the processes (e.g., fusion model 202, embedded model 204, and multi-data attributes model 206) can be implemented by one or more components of the risk assessment computing system 130 as described with reference to FIG. 1.
[0041] In the fusion model 202, data attributes 208 can be used to generate a standard data risk indicator 210 (i.e., a baseline data risk indicator) for a particular entity. That is, data attributes 208 can be stored in the risk data repository 122. These data attributes 208 can be used to train the standard data model 119 to generate the standard data risk indicator 210. The standard data risk indicator 210 can indicate a predicted amount of risk associated with a particular entity. The model training application 112 can also access alternative data atributes 212 from the alternative data source 123. These alternative data atributes 212 can be used to train the alternative data model 121 to generate the alternative risk indicator 214. Finally, the trained risk assessment model 120 can be used to combine the standard data risk indicator 210 with the alternative risk indicator 214.
[0042] In the embedded model 204, data atributes 208 can be used to train the standard data model 119 to output a standard data risk indicator 210. In this example, the alternative data atributes 212 can be directly fed into the risk assessment model 120 to be used in conjunction with the risk indicator to generate the final risk indicator 216. In this example, the alternative data atributes 212 are combined with the risk indicator 210 into a single model (e.g., the risk assessment model 120). In such an example, the alternative data can be used to complement the predictive power of the standard data machine learning model 119.
[0043] In the multi -data atributes model 206, multi-data atributes 218 can be computed by aligning data 220 from the risk data repository 122 with alternative data 222 from the alternative data source 123. The multi-data atributes 218 can then be used to train the risk assessment model 120 to determine the final risk indicator 216. In this example, the multidata atributes model 206 can improve efficiency of providing a final risk indicator and can be used with or without the inclusion of the alternative data.
Example Application of Alternative Data
[0044] The below example is discussed with respect to a risk indicator generated based on credit data. However, systems and methods disclosed herein can be used in a number of applications for incorporating alternative data to supplement existing data sets.
[0045] As discussed above, four models may be built to generate risk indicators using alternative data. These four models can include: a credit-only model, a fusion model, an embedded model, and a multi-data atribute model that leverages the purposed view. In an example, when using alternative consumer data, all three alternative approaches may provide lift over the baseline credit model (e.g., over the baseline risk assessment model).
[0046] FIG. 3 is a comparison of the results of a Kolmogorov-Smirnov (KS) statistics test for the above-listed approaches to incorporating alternative data into a risk indicator or risk score, such as a credit score. In FIG. 3, graph 300 shows the KS test statistic performance of each model, including a model that leverages alternative consumer data only. KS is a performance statistic used to measure the separation between score distributions on consumers who paid as agreed versus those that became delinquent. For thin-file consumers, all three approaches showed substantial improvements in all metrics. In this segment, the multi-data attributes model provided the most lift overall in all metrics. For example, for the segment overall, multi-data attributes provided a 6.6 percent overall lift over a credit-only benchmark KS value of 46.6.
[0047] Graph 400, shown in FIG. 4, summarizes the KS and Gini shifts as well as the shifts in delinquent capture rates. As shown in FIG. 4, All three approaches provided a lift in delinquent capture rates through the bottom three deciles, with the multi-data attributes model providing the most lift in the bottom 30 percent of scores (P30). The only exception was in the utility segment, where the embedded model approach provided the most lift (with multidata attributes model providing the least lift). In the bottom 30 percent, the multi-data attributes improved overall delinquent capture rates by 4.6 percent over the baseline credit model (representing 293 BPs of improvement).
[0048] FIG. 5 illustrates that each of the examined models is aligned, and thus policy changes do not need to be made to account for the inclusion of the alternative data in credit scores. Graph 500 shows interval delinquency rates and demonstrates that the models are aligned. If similar scores yield significant differences in the delinquency rate, then risk indicator users would have to revise policies to modify thresholds for both acceptance and pricing bands. In the case of different thresholds for each model, the user experience would be negatively impacted. Thus, FIG. 5 shows that all multi-data models are aligned and, consequently, the user policies do not need to be altered for each model. Accordingly, disclosed systems and methods facilitate seamless inclusion of alternative data in risk assessment models.
[0049] Additional advantages of including alternative data in risk indicators can be seen in whether the inclusion of such alternative data changes consumers’ credit scores. In some examples, risk indicator users may want to have the flexibility to turn on and turn off the use of the alternative data source. Two specific examples are when a risk indicator desires to know the impact that the alternative data has on the consumer's score or the risk indicator user may not want to pay for, acquire, or use the alternative data for certain consumer decisions.
[0050] To address the impact that the alternative data has on consumers, FIG. 6 shows the change in scores with the alternative data and without the alternative data in graph 600. Graph 600 shows the change in scores for the multi-data model with and without alternative consumer data. The median change on a 1000-point scale is 6.1 indicating that using alternative consumer data accounts results in an increased score relative to using only credit accounts in the multi-data attributes model.
[0051] As described above, the three modeling techniques (fusion, embedded, and multidata attributes) with alternative data can provide lift over baseline credit models. For example, in a test case for new auto originations and predicting 60 days delinquent or worse over the subsequent 24 months, the multi -data attribute approach provided lift in terms of KS, Gini, and delinquent capture rates. However, the embedded model approach showed the highest lift in the thin file consumer segment in terms of KS, Gini, and delinquent capture rates.
[0052] Accordingly, each approach has advantages over a standard credit score. The fusion modeling technique may facilitate three product options (e.g., credit only model, alternative consumer data only model, and the fused model), the embedded model may facilitate two product options (credit only model and the embedded model), and the multidata model may also facilitate a plurality of product options (e.g., the multi -data attribute model using all available data sources, the multi-data attribute model using credit only data, and the multi -data attribute model using any other combination of two or more data sources). Unlike the fusion model, the embedded model allows each alternative data attribute to be estimated simultaneously to complement predictive power already captured in the credit model. Consequently, in some examples, the embedded model may result in the alternative data being more impactful than the fusion model, depending on the application.
[0053] An additional advantage of the fusion model is that there are individual scores to waterfall back to if a particular score is unscorable or unavailable for a consumer. In this case, the models may need to be aligned to properly waterfall back to the individual scores. An advantage of the embedded model is that the new data source (e.g., the source providing the alternative data) can complement the predictive power of the original risk assessment model.
Example of Generating a Risk Indicator Using a Risk Assessment Model and Alternative
Data
[0054] FIG. 7 is a flow chart depicting an example of a process 700 for using a risk assessment model 120 to generate a final risk indicator for a target entity. Once trained, the risk assessment model 120 can generate the final risk indicator for a target entity based on predictor variables 124 associated with the target entity. One or more computing devices (e.g., the risk assessment server 118) implement operations depicted in FIG. 7 by executing suitable program code (e.g., the risk assessment application 114). For illustrative purposes, the process 700 is described with reference to certain examples depicted in the figures. Other implementations, however, are possible. While the blocks of the process 700 are described in the temporal order below for illustrative purposes, it may be appreciated that the blocks can occur in any order and some blocks may occur simultaneously.
[0055] At block 702, the process 700 involves accessing a risk assessment query associated with a target entity. In some cases, the risk assessment computing system 130 (e.g., the risk assessment server 118) may receive a risk assessment query associated with a particular entity (e.g., the target entity). The risk assessment query may be received from the target entity requesting the risk assessment. In additional or alternative implementations, the risk assessment query may be received from a remote computing device associated with an entity authorized to request risk assessment of the target entity. In some examples, the risk assessment request can include alternative data or can request the inclusion of alternative data from the alternative data source 123 in the final risk indicator.
[0056] At block 704, the process 700 involves training a machine learning model to determine a final risk indicator from a baseline data and an alternative data. In an example, the risk assessment computing system 130 can train the machine learning model 120 to determine the final risk indicator based on a baseline data risk indicator determined by the standard data model 119 and an alternative data risk indicator determined by the alternative data model 121. In additional examples, the machine learning model may determine the final risk indicator based on only the baseline data or only the alternative data dependent upon an indication that some of the data (e.g., either the baseline or alternative data) should be suppressed.
[0057] As discussed above, varying methods can be used to generate the final risk indicator. For example, the process 700 can implement the fusion model 202 in which the standard data model 119 and the alternative data model 121 generate the baseline data risk indicator and the alternative risk indicator, respectively. Then, the machine learning model 120 can combine these risk indicators into the final risk indicator. In another example, the process 700 can include implementing the embedded model 204. In the embedded model 204, the standard data model 119 can generate the baseline data risk indicator. In this example, the machine learning model 120 can receive the baseline data risk indicator and alternative data attributes as input and can output the final risk indicator. In another example, the process 700 can include aligning standard data or predictor variables 124, stored by the risk data repository 122, with the alternative data stored by the alternative data source 123. Once the data is aligned, the risk assessment computing system 130 can generate multi-data attributes. These multi-data attributes can be used to train the machine learning model 120 to output the final risk indicator.
[0058] In an example, a credit only baseline data risk indicator can be delivered by suppressing or otherwise turning off the alternative data. Additionally, an alternative data only risk indicator may be delivered by suppressing or otherwise turning off the baseline data. Moreover, a multi-data risk indicator for the final risk indicator can be delivered by keeping both the baseline data and the alternative data active or otherwise turned on. Thus, a single risk indicator can serve the purpose of baseline risk indicator, alternative data risk indicator, or final risk indicator.
[0059] In some examples, the process 700 can include training the standard data model
119 and the alternative data model 121. For example, the standard data model 119 can be trained on the predictor variables 124 stored in the risk data repository 122. The alternative data model 121 can be trained on alternative data stored in the alternative data source 123. In other examples, the risk assessment computing system 130 can access the alternative data source 123 and use the alternative data to generate alternative data attributes for use as input to the machine learning model 120.
[0060] At step 706, the process 700 involves generating the final risk indicator for the target entity. For example, using the fusion model 202, the machine learning model 120 can receive as input a baseline data risk indicator associated with the target entity and an alternative data risk indicator associated with the target entity. The machine learning model
120 can generate the final risk indicator for the target entity based on these respective risk indicators. In another example, e.g., using the embedded model 204, the machine learning model 120 can receive a baseline data risk indicator associated with the target entity (e.g., as determined by the trained standard data model 119) and alternative data attributes associated with the target entity from which to generate the final indicator. In an additional example, the process 700 can involve accessing data and alternative data associated with the target entity, aligning the data, and generating multi-data attributes associated with the target entity. These multi-data attributes can be used as input to the machine learning model 120 to generate the final risk indicator for the target entity.
[0061] At block 708, the process 700 includes transmitting, to a remote computing device, a responsive message comprising at least the final risk indicator for use in controlling access of the target entity to one or more computing environments. For example, the risk assessment server 118 can return a responsive message to the client computing system 104. The responsive message can include at least the final risk indicator and explanatory data associated with the risk indicator. The explanatory data can indicate relationships between changes in the risk indicator and changes in at least some of the predictor variables 124 associated with the target entity. In some examples, the responsive message can include the baseline data risk indicator and the alternative data risk indicator.
[0062] The final risk indicator (as well as the baseline data risk indicator and the alternative data risk indicator) can correspond to a level of risk associated with the target entity, for example with respect to accessing protecting computing resources. The final risk indicator can be used for one or more operations that involve performing an operation with respect to the target entity based on a predicted risk associated with the target entity. In one example, the final risk indicator can be utilized to control access to one or more interactive computing environments by the target entity. The risk assessment computing system 130 can communicate with client computing systems 104, which may send risk assessment queries to the risk assessment server 118 to request risk assessment. The client computing systems 104 may be associated with technological providers, such as cloud computing providers, online storage providers, or financial institutions such as banks, credit unions, credit-card companies, insurance companies, or other types of organizations. The client computing systems 104 may be implemented to provide interactive computing environments for customers to access various services offered by these service providers. Customers can use user computing systems 106 to access the interactive computing environments thereby accessing the services provided by these providers.
[0063] Based on the received final risk indicator, the client computing system 104 can determine whether to grant the customer access to the interactive computing environment. If the client computing system 104 determines that the level of risk associated with the customer accessing the interactive computing environment and the associated technical or financial service is too high, the client computing system 104 can deny access by the customer to the interactive computing environment. Conversely, if the client computing system 104 determines that the level of risk associated with the customer is acceptable, e.g., is below a predetermined threshold, the client computing system 104 can grant access to the interactive computing environment by the customer and the customer would be able to utilize the various services provided by the service providers. For example, with the granted access, the customer can utilize the user computing system 106 to access clouding computing resources, online storage resources, web pages or other user interfaces provided by the client computing system 104 to execute applications, store data, query data, submit an online digital application, operate electronic tools, or perform various other operations within the interactive computing environment hosted by the client computing system 104.
Example of Computing System for Machine-Learning Operations
[0064] Any suitable computing system or group of computing systems can be used to perform the operations for the machine -learning operations described herein. For example, FIG. 8 is a block diagram depicting an example of a computing device 800, which can be used to implement the risk assessment server 118 or the model training server 110. The computing device 800 can include various devices for communicating with other devices in the operating environment 100, as described with respect to FIG. 1. The computing device 800 can include various devices for performing one or more transformation operations described above with respect to FIGS. 1-7.
[0065] The computing device 800 can include a processor 802 that is communicatively coupled to a memory 804. The processor 802 executes computer-executable program code stored in the memory 804, accesses information stored in the memory 804, or both. Program code may include machine -executable instructions that may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, among others.
[0066] Examples of a processor 802 include a microprocessor, an application-specific integrated circuit, a field-programmable gate array, or any other suitable processing device. The processor 802 can include any number of processing devices, including one. The processor 802 can include or communicate with a memory 804. The memory 804 stores program code that, when executed by the processor 802, causes the processor to perform the operations described in this disclosure.
[0067] The memory 804 can include any suitable non-transitory computer-readable storage medium. The computer-readable medium can include any electronic, optical, magnetic, or other storage device capable of providing a processor with computer-readable program code or other program code. Non-limiting examples of a computer-readable medium include a magnetic disk, memory chip, optical storage, flash memory, storage class memory, ROM, RAM, an ASIC, magnetic storage, or any other medium from which a computer processor can read and execute program code. The program code may include processorspecific program code generated by a compiler or an interpreter from code written in any suitable computer-programming language. Examples of suitable programming language include Hadoop, C, C++, C#, Visual Basic, Java, Python, Perl, JavaScript, ActionScript, etc. [0068] The computing device 800 may also include a number of external or internal devices such as input or output devices. For example, the computing device 800 is shown with an input/output interface 808 that can receive input from input devices or provide output to output devices. A bus 806 can also be included in the computing device 800. The bus 806 can communicatively couple one or more components of the computing device 800.
[0069] The computing device 800 can execute program code 814 that includes the risk assessment application 114 and/or the model training application 112. The program code 814 for the risk assessment application 114 and/or the model training application 112 may be resident in any suitable computer-readable medium and may be executed on any suitable processing device. For example, as depicted in FIG. 8, the program code 814 for the risk assessment application 114 and/or the model training application 112 can reside in the memory 804 at the computing device 800 along with the program data 816 associated with the program code 814, such as the predictor variables 124 and/or the model training samples. Executing the risk assessment application 114 or the model training application 112 can configure the processor 802 to perform the operations described herein.
[0070] In some aspects, the computing device 800 can include one or more output devices. One example of an output device is the network interface device 810 depicted in FIG. 8. A network interface device 810 can include any device or group of devices suitable for establishing a wired or wireless data connection to one or more data networks described herein. Non-limiting examples of the network interface device 810 include an Ethernet network adapter, a modem, etc.
[0071] Another example of an output device is the presentation device 812 depicted in FIG. 8. A presentation device 812 can include any device or group of devices suitable for providing visual, auditory, or other suitable sensory output. Non-limiting examples of the presentation device 812 include a touchscreen, a monitor, a speaker, a separate mobile computing device, etc. In some aspects, the presentation device 812 can include a remote client-computing device that communicates with the computing device 800 using one or more data networks described herein. In other aspects, the presentation device 812 can be omitted.
[0072] The foregoing description of some examples has been presented only for the purpose of illustration and description and is not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Numerous modifications and adaptations thereof will be apparent to those skilled in the art without departing from the spirit and scope of the disclosure.

Claims

Claims
1. A method that includes one or more processing devices performing operations comprising: accessing a risk assessment query including an indication of a target entity; accessing a trained machine learning model that is trained to determine a final risk indicator of the target entity from a baseline data associated with the target entity and an alternative data associated with the target entity; generating the final risk indicator of the target entity using the trained machine learning model, the baseline data associated with the target entity, and the alternative data associated with the target entity; and transmitting, to a remote computing device, a responsive message comprising at least the final risk indicator for use in controlling access of the target entity to one or more computing environments.
2. The method of claim 1, the method further comprising: accessing a baseline data machine learning model trained to determine a baseline data risk indicator of the target entity from predictor variables associated with the baseline data of the target entity; and generating the baseline data risk indicator using the baseline data machine learning model and the predictor variables associated with the target entity, wherein the final risk indicator of the target entity depends on the baseline data risk indicator.
3. The method of claim 2, the method further comprising: accessing an alternative data machine learning model trained to determine an alternative data risk indicator of the target entity from the alternative data associated with the target entity; and generating the alternative data risk indicator using the alternative data machine learning model and the alternative data associated with the target entity, wherein the final risk indicator of the target entity further depends on the alternative data risk indicator.
4. The method of claim 3, wherein the alternative data machine learning model is trained to determine the alternative data risk indicator by: generating a set of alternative data attributes based on the alternative data; and training the alternative data machine learning model on the set of alternative data attributes.
5. The method of claim 2, the method further comprising: generating a set of alternative data attributes based on the alternative data; and determining, using the trained machine learning model, the final risk indicator based on the baseline data risk indicator and the set of alternative data attributes.
6. The method of claim 1, wherein the trained machine learning model is trained to generate the final risk indicator by: generating a set of multi-data attributes by aligning a set of training baseline data and a set of training alternative data; and training an untrained machine learning model using the set of multi -data attributes.
7. The method of claim 1, wherein generating the final risk indicator of the target entity using the trained machine learning model further comprises: generating a baseline risk indicator by suppressing availability of the alternative data to the trained machine learning model; generating an alternative risk indicator by suppressing availability of the baseline data to the trained machine learning model; and generating a multi-data risk indicator by enabling availability of the baseline data and the alternative data to the trained machine learning model.
8. A system comprising: a processing device; and a memory device in which instructions executable by the processing device are stored for causing the processing device to perform operations comprising: accessing a trained machine learning model to determine a final risk indicator of a target entity from a baseline data associated with the target entity and an alternative data associated with the target entity; generating the final risk indicator of the target entity using the trained machine learning model, the baseline data associated with the target entity, and the alternative data associated with the target entity; and transmitting, to a remote computing device, a responsive message comprising at least the final risk indicator for use in controlling access of the target entity to one or more computing environments.
9. The system of claim 8, wherein the operations further comprise: accessing a baseline data machine learning model trained to determine a baseline data risk indicator of the target entity from predictor variables associated with the baseline data of the target entity; and generating the baseline data risk indicator using the baseline data machine learning model and the predictor variables associated with the target entity, wherein the final risk indicator of the target entity depends on the baseline data risk indicator.
10. The system of claim 9, wherein the operations further comprise: accessing an alternative data machine learning model trained to determine an alternative data risk indicator of the target entity from the alternative data associated with the target entity; and generating the alternative data risk indicator using the alternative data machine learning model and the alternative data associated with the target entity, wherein the final risk indicator of the target entity further depends on the alternative data risk indicator.
11. The system of claim 10, wherein the alternative data machine learning model is trained to determine the alternative data risk indicator by: generating a set of alternative data attributes based on the alternative data; and training the alternative data machine learning model on the set of alternative data attributes.
12. The system of claim 9, wherein the operations further comprise: generating a set of alternative data attributes based on the alternative data; and determining, using the trained machine learning model, the final risk indicator based on the baseline data risk indicator and the set of alternative data attributes.
13. The system of claim 8, wherein the trained machine learning model is trained to generate the final risk indicator by: generating a set of multi-data attributes by aligning a set of training baseline data and a set of alternative training data; and training an untrained machine learning model using the set of multi -data attributes.
14. The system of claim 8, wherein generating the final risk indicator of the target entity using the trained machine learning model further comprises: generating a baseline risk indicator by suppressing availability of the alternative data to the trained machine learning model; generating an alternative risk indicator by suppressing availability of the baseline data to the trained machine learning model; and generating a multi-data risk indicator by enabling availability of the baseline data and the alternative data to the trained machine learning model.
15. A non-transitory computer-readable storage medium having program code that is executable by a processor to cause a computing device to perform operations, the operations comprising: accessing a trained machine learning model to determine a final risk indicator of a target entity from a baseline data associated with the target entity and an alternative data associated with the target entity; generating the final risk indicator of the target entity using the trained machine learning model, the baseline data associated with the target entity, and the alternative data associated with the target entity; and transmitting, to a remote computing device, a responsive message comprising at least the final risk indicator for use in controlling access of the target entity to one or more computing environments.
16. The non-transitory computer-readable storage medium of claim 15, wherein the operations further comprise: accessing a baseline data machine learning model trained to determine a baseline data risk indicator of the target entity from predictor variables associated with the baseline data of the target entity; and generating the baseline data risk indicator using the baseline data machine learning model and the predictor variables associated with the target entity, wherein the final risk indicator of the target entity depends on the baseline data risk indicator.
17. The non-transitory computer-readable storage medium of claim 16, wherein the operations further comprise: accessing an alternative data machine learning model trained to determine an alternative data risk indicator of the target entity from the alternative data associated with the target entity; and generating the alternative data risk indicator using the alternative data machine learning model and the alternative data associated with the target entity, wherein the final risk indicator of the target entity further depends on the alternative data risk indicator.
18. The non-transitory computer-readable storage medium of claim 17, wherein the alternative data machine learning model is trained to determine the alternative data risk indicator by: generating a set of alternative data attributes based on the alternative data; and training the alternative data machine learning model on the set of alternative data attributes.
19. The non-transitory computer-readable storage medium of claim 16, wherein the operations further comprise: generating a set of alternative data attributes based on the alternative data; and determining, using the trained machine learning model, the final risk indicator based on the baseline data risk indicator and the set of alternative data attributes.
20. The non-transitory computer-readable storage medium of claim 15, wherein the trained machine learning model is trained to generate the final risk indicator by: generating a set of multi-data attributes by aligning a set of training baseline data and a set of alternative training data; and training an untrained machine learning model using the set of multi -data attributes.
PCT/US2024/041759 2023-08-10 2024-08-09 Controlling access using a risk indicator generated with alternative data WO2025035116A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363518793P 2023-08-10 2023-08-10
US63/518,793 2023-08-10

Publications (1)

Publication Number Publication Date
WO2025035116A1 true WO2025035116A1 (en) 2025-02-13

Family

ID=92672251

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/041759 WO2025035116A1 (en) 2023-08-10 2024-08-09 Controlling access using a risk indicator generated with alternative data

Country Status (1)

Country Link
WO (1) WO2025035116A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230113118A1 (en) * 2021-10-07 2023-04-13 Equifax Inc. Data compression techniques for machine learning models
US20230196147A1 (en) * 2021-12-22 2023-06-22 Equifax Inc. Unified explainable machine learning for segmented risk

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230113118A1 (en) * 2021-10-07 2023-04-13 Equifax Inc. Data compression techniques for machine learning models
US20230196147A1 (en) * 2021-12-22 2023-06-22 Equifax Inc. Unified explainable machine learning for segmented risk

Similar Documents

Publication Publication Date Title
US12141320B2 (en) Secure permissioning of access to user accounts, including secure distribution of aggregated user account data
US20230196147A1 (en) Unified explainable machine learning for segmented risk
US20230111785A1 (en) Machine-learning techniques to generate recommendations for risk mitigation
US20220207324A1 (en) Machine-learning techniques for time-delay neural networks
US12061671B2 (en) Data compression techniques for machine learning models
US12248756B2 (en) Creating predictor variables for prediction models from unstructured data using natural language processing
US20230121564A1 (en) Bias detection and reduction in machine-learning techniques
US20230205893A1 (en) Machine-learning techniques for risk assessment based on multiple data sources
US20230196455A1 (en) Consolidation of data sources for expedited validation of risk assessment data
EP4413497A1 (en) Power graph convolutional network for explainable machine learning
US20230162053A1 (en) Machine-learning techniques for risk assessment based on clustering
WO2025035116A1 (en) Controlling access using a risk indicator generated with alternative data
US20250220019A1 (en) Risk assessment for personally identifiable information associated with controlling interactions between computing systems
US20240412215A1 (en) Decentralized and anonymized verification of user characteristics
US20250200654A1 (en) Automated recourse
US20230342605A1 (en) Multi-stage machine-learning techniques for risk assessment
WO2025034862A1 (en) Constrained non-linear hybrid models for prediction from multiple data sources
WO2025144811A1 (en) Machine-learning techniques with large graphs
WO2025101903A1 (en) Enhanced rank-order for risk assessment using parameterized decay

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24765797

Country of ref document: EP

Kind code of ref document: A1