US20130339218A1 - Computer-Implemented Data Storage Systems and Methods for Use with Predictive Model Systems - Google Patents
Computer-Implemented Data Storage Systems and Methods for Use with Predictive Model Systems Download PDFInfo
- Publication number
- US20130339218A1 US20130339218A1 US13/905,524 US201313905524A US2013339218A1 US 20130339218 A1 US20130339218 A1 US 20130339218A1 US 201313905524 A US201313905524 A US 201313905524A US 2013339218 A1 US2013339218 A1 US 2013339218A1
- Authority
- US
- United States
- Prior art keywords
- entity
- raw data
- fraud
- analysis
- selection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G06Q40/025—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/38—Payment protocols; Details thereof
- G06Q20/40—Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
- G06Q20/401—Transaction verification
- G06Q20/4016—Transaction verification involving fraud or risk level assessment in transaction processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/018—Certifying business or products
- G06Q30/0185—Product, service or business identity fraud
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/02—Banking, e.g. interest calculation or account maintenance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/06—Asset management; Financial planning or analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/12—Accounting
Definitions
- This document relates generally to computer predictive models and more particularly to constructing and using computer predictive models.
- a system and method can be configured to contain a raw data repository for storing raw data related to financial transactions.
- a data store contains rules to indicate how many generations or a time period within which data items are to be stored in the raw data repository. Data items stored in the raw data repository are then accessed by a predictive model in order to perform fraud detection.
- FIG. 1 is a block diagram depicting a computer-implemented system for generating and using predictive models to assess whether fraudulent activity may have occurred.
- FIG. 2 is a block diagram depicting examples of input data.
- FIG. 3 is a graph showing an account compromise period.
- FIG. 4 is a block diagram depicting use of non-monetary information.
- FIG. 5 is a block diagram depicting a system being configured to produce a score even in the absence of a current or new transaction on the account.
- FIGS. 6 and 7 are time line graphs showing a transaction time line and a scoring trigger time line.
- FIG. 8 is a block diagram depicting examples of client-defined events.
- FIG. 9 is a block diagram depicting a system for storing information for use in fraud detection.
- FIG. 10 is a block diagram depicting storage of fields within a data storage system.
- FIG. 11 is a block diagram depicting a determination of the number of generations to store for a field.
- FIG. 12 is a block diagram depicting an approach to determine storage rules for a system.
- FIG. 13 is a block diagram depicting another approach to determine storage rules for a system.
- FIG. 14 is a block diagram depicting storage of information in its raw/unprocessed form.
- FIGS. 15-18 are block diagrams depicting systems configured with missing value imputation processing capability.
- FIG. 19 is a flowchart depicting a training approach to address fraud on an account-level fashion.
- FIG. 20 illustrates a data partitioning example
- FIG. 21 is a block diagram depicting an iterative training approach.
- FIG. 22 illustrates example scoring results.
- FIGS. 23-25 provide another example for training a model.
- FIG. 26 is a block diagram depicting a reason code determination process that can be used to create reason codes for a scoring system/predictive model.
- FIG. 27 is a block diagram depicting construction of reason codes.
- FIG. 28 is a flowchart depicting construction of reason codes.
- FIG. 29 is a flowchart depicting importance of a reason factor to a score.
- FIGS. 30-32 are block diagrams depicting a view selector module that allows a user or computer program to select an entity or type of entity for analysis.
- FIGS. 33 and 34 are block diagrams depicting an integrated system for fraud analysis.
- FIG. 1 depicts at 30 a computer-implemented system for generating and using predictive models 34 to assess whether fraudulent activity may have occurred.
- Accurate detection of fraud is important in that it results in action being taken earlier and in a more effective manner for addressing the fraudulent activity.
- An action could include for example whether a credit card company should personally investigate if fraud may have occurred with respect to a particular credit card holder.
- the system 30 can be configured to process one entity or many entities.
- input data 32 is used during a development phase to create/train one or more predictive models 34 .
- the predictive models 34 are then used during a production phase to receive input 32 , such as from a financial institution, to generate fraud analysis results 36 .
- the input data 32 can be of many different types. Examples of such input data 32 are shown in FIG. 2 .
- a fraud detection model 100 can receive fraud information 102 and other input information 104 , such as posting/transaction information, authorization information, cycle cut information, etc.
- An example of fraud data could be the date of the first fraud as reported by a customer.
- a customer may call a financial institution to indicate that one or more transactions that appeared on their credit card statement and represent a fraudulent use.
- An example of fraud is when a person steals a credit card number and uses it to purchase items.
- the input fraud data can include several dates, such as the date on which fraud was first suspected to have occurred and a block date which is the date on which no further transactions should be authorized.
- a predictive model 100 can be trained to detect fraud (e.g., whether an entity has been compromised as shown at 110 ) within this account compromise period as early as possible.
- the fraud data can be one record per account or multiple records per account. For example and as illustrated at 150 in FIG. 3 , the data could have one record for each compromised account that identifies the beginning of the compromised period and the end of the compromised period.
- the compromised period may include both fraudulent as well as non-fraudulent transactions. This mixture is acceptable because the predictive model is trained not to detect fraud for a particular transaction, but whether an account should be deemed as having been compromised.
- Account-level fraud detection is a preferred approach over a transaction-based system because most financial institutions are more interested in whether an account has been compromised in order to stop the “bleeding” (e.g., reduce the amount of fraud) and not whether a particular transaction is fraudulent or not.
- the system can also utilize payment information 106 and/or non-monetary information 108 to detect whether the account is in a fraud state.
- payment information is the credit card payment information received on a monthly or other periodic basis.
- non-monetary data is an address change, phone change, mother name change, or credit line change.
- another data feed could be postings. Postings are the process for the recording of debits and credits to individual cardholder account balances.
- non-monetary information 108 is provided regarding an entity 202 that has a relationship with a financial institution such as a bank 204 .
- the entity 202 itself can be at different levels. These levels could be but are not limited to the card level 210 , customer level 212 , or account level 214 .
- the fraud detection process determines whether the entity 202 has been compromised (e.g., whether fraud has been detected).
- FIG. 5 depicts at 250 that a system can be configured to produce a score 276 even in the absence of a current or new transaction on the account (e.g., independent of whether a transaction-type event has occurred), which is an aid in the efficient use of resources to manage fraud cases.
- This is in contrast to most of today's fraud detection systems which only produce a score when a transaction (typically authorization) comes through the system. This is not particularly useful for managing case queues efficiently, when the fact that no additional transaction occurred during a certain time period could represent additional information that would be very useful in actively managing fraud.
- the system can also be configured such that at any time the system can generate via process 254 a fraud score 252 .
- This account-level score indicates whether an account is in a compromised state or not.
- FIG. 6 illustrates at 300 that with scoring on demand, a different score (e.g., at “S2”) might be produced even though only the passage of time had occurred with no new transactional information being received (e.g., “S2” was generated despite a new transaction “T2” not occurring until later).
- the trigger 260 is asynchronous with respect to an incremental transaction 262 (e.g., an authorization transaction 290 , non-monetary transaction 292 , payment transaction 294 , etc.).
- an incremental transaction 262 e.g., an authorization transaction 290 , non-monetary transaction 292 , payment transaction 294 , etc.
- a trigger 260 provides an indicator that records should be retrieved in process 270 from a repository 272 .
- the records are to be generated for scoring process 274 for determining a score 276 as to whether an entity (e.g., an account) has been compromised.
- the records can be “raw” data (e.g., the actual transaction data received over time) from which features can be derived on-the-fly for use by the predictive model. However it should be understood that the retrieved records could also include derived data.
- the repository 272 is updated via process 280 with every transaction, but a score-on-demand trigger 260 for deriving features is independent of receipt of an incremental transaction 262 and instead is dependent upon receipt of non-incremental information, such as date and time information.
- Account information is provided in order to specify which account should be scored. Date and time information is provided because just the passage of time may result in a different score. The date and time information can be provided as part of the input by the requestor or can be obtained by other means, such as from a computer system clock. It should be understood that similar to the other processing flows described herein, the steps and the order of the processing steps described herein may be altered, modified and/or augmented and still achieve the desired outcome.
- a time trigger would be to score one or more accounts at a periodic interval (e.g., every 48 hours) irrespective of whether an incremental transaction has occurred.
- a random trigger may also be used to randomly detect whether fraud has occurred with respect to one or more accounts.
- the score for an account may be 900 but after only a passage of time and with no new transactions, the system might generate a different score such as 723. Such a situation might arise if a legitimate but highly suspicious transaction occurred. Since no transaction occurred over a period of time this lowers the likelihood of the account being in a compromised state.
- Previous systems would have problems in generating scores that are asynchronous with respect to a transaction occurrence because they generally store aggregated/derived data and not raw data and thus they lose relevant history associated with an account to perform asynchronous scoring. In previous systems, the aggregated/derived data is specifically suited for the scoring application and thus lacks flexibility to readily perform other types of scoring beyond its immediate purpose.
- the analysis process may have detected that three different credit cards were used at the same restaurant at about the same time and one of the credit cards has been determined as being in a compromised state.
- the scoring process can then be triggered for the other two credit card accounts and the scoring process will factor in the other two credit card accounts' information when generating a score for the third card. Accordingly whenever fraud is detected with respect to a first card, the scoring process can be triggered for any other card issued from the financial institution that was used at the same place or places as the first card.
- a client can define an event that would trigger scoring of an account as shown at “S2.”
- FIG. 8 depicts examples at 400 of client-defined events which could be an event wherein an account's password is changed as shown at 402 or a customer's car is stolen as shown at 404 .
- a monitoring process 406 can determine when one of these triggers has occurred with respect to an account. These triggers indicate when data is to be extracted from a repository 272 . For the different types of triggers, a financial institution can select whether all accounts are to be processed or only certain accounts.
- the updating via process 280 of the repository 272 with incremental transaction information 262 occurs asynchronously with respect to a trigger 260 for generating via process 274 a score 276 on demand.
- the scoring can also occur based upon the receipt of a new transaction for an account.
- an incremental transaction indicates that a transaction has occurred that increases the amount of information with respect to an account (e.g., increases information resolution).
- An example of this type of transaction could be a purchase event wherein an authorization is requested for money to be subtracted from an account.
- a non-incremental event is one where no additional information is available relative to an entity other than that there has been a passage of time. A non-incremental event can then act as a trigger that is asynchronous with respect to whether an incremental transaction has occurred or not.
- This time-passage-only type of trigger is useful to an account that may be considered on the cusp or edge (e.g., is the entity going to fall fraudulent or non-fraudulent). For example a cardholder's automobile is reported as stolen. In such situations a credit card or debit card may also have been stolen and usually large dollar amount transactions are recorded within the first couple of hours after the vehicle is stolen.
- the system can generate a trigger every fifteen minutes for the next three hours to score the account irrespective of whether a purchase transaction has occurred. The first scoring may have a higher score because it is closer in time to when the car was reported as stolen, but each subsequent scoring within the three hour window wherein no incremental transactions has occurred can see lower scores.
- a fraud analyst arrives at work in the morning with a queue of accounts to analyze.
- the question confronting the analyst is which account the analyst should consider first.
- the analyst sees that scoring on these accounts has not occurred since last night.
- the analyst then sends a request that these accounts should be scored again.
- the new scoring may reorder the queue (which would alter the order of accounts the analyst is to process, such as by calling customers).
- FIG. 9 depicts at 450 a system for storing information for use in fraud detection 480 .
- the system of FIG. 9 stores the raw data 452 instead of derived feature information which is used in a typical current system.
- the typical current system's storage approach creates problems because there may be a need to view recent transactions in context of the account's or card's past history. Ideally, a significant portion of the raw historical transactions could be included for each score. However, for real-time systems, this has proven to have an unacceptable impact on throughput and response time.
- Alternative schemes involve saving only summarized information. While this does reduce the throughput, it also limits the types of variables and the level of detail available to the model.
- the system of FIG. 9 contains a repository of historical data. This is not aggregate or derived data but raw data 452 . For example no summaries or averages of raw transactional data is stored in the repository 470 .
- Raw data 452 is being processed and stored via process 460 and then retrieved (e.g., by fraud detection process 480 ) in order to determine whether an entity has been compromised.
- a combination of raw data and derived data can be stored.
- storage rules 454 specify how many generations of raw data 452 should be stored in the repository 470 . This determination could include how many raw payment amounts should be stored. The determination of how many generations should be stored is based upon the type of transaction as well as the transaction fields. This may result in varying lengths of the fields being stored in the repository 470 as illustrated at 500 in FIG. 10 . For example, the payment amounts for the last seven transactions may be stored in the repository. However for another type of information, only the previous five values need to be stored. Thus the length for one field might be seven generations in length, whereas for another field, only five generations in length might be stored in the repository.
- a storage rule can specify how many authorization amounts should be stored for an entity in the raw state (e.g., without any aggregation or other type of transformation into a derived variable).
- the data can be stored in a circular list (e.g., a doubly linked list) for each field. They can comprise varying lengths in the circular lists for the data fields.
- a data field may have the previous three generations stored, whereas another data field may have the previous eight generations stored.
- the circular lists are stored in an indexed file. However it should be understood that other storage mechanisms may be utilized such as storage in a relational database.
- the system can still operate even if not all of the generations for a particular data field has been stored. For example a relatively new card may have only enough raw data to store three generations of payment authorization amounts although the storage rules for this data field may allow storage of up to fifteen generations. A predictive model can still operate even though a particular data field does not have all of the generations specified by the storage rules.
- the storage of raw data in the repository reflects a compromise between an ideal situation where all historical information that can be obtained for an entity is stored (that is used to make a prediction) versus the physical constraints of storage capacity and/or performance. In reaching that compromise it should be noted that a less than optimal situation might exist in determining what timeframe/number of generations should be stored for one or more data fields. It should also be noted that storage rules can use the number of generations (e.g., the previous four generations) and/or a particular timeframe (e.g., only the previous three weeks) in determining how much raw data for a particular data field should be stored. For situations where more generations, longer time frames are needed for a particular data field, a multi-resolution scheme can be used. In other words, the storage can store only every k events/transactions where k varies based on the recency of the transactions/events.
- Storage rules dictate how far back in history should data be stored.
- the history can be at different levels, such as at the transaction level or at another level such as at an individual field level.
- an authorization the system may receive an authorization amount, a merchant identifier, and a date-time stamp.
- the system might decide that it does not need the same history for all these different pieces of data, so the system based upon the storage rules stores the past ten transaction amounts but only the previous five merchant identifiers.
- the buffered lengths for the different data types could vary.
- Even the same field (e.g., the date-time stamp field) for two different transaction types may have different storage rules. For example for one type of transaction five generations of date-time stamps may be needed but for another type of transaction eight generations may need to be stored.
- the system stores information about different entities and uses the information from multiple entities to determine whether a particular account has been compromised.
- An entity could be a card and another entity could be an account comprising multiple cards.
- Another entity could comprise ZIP code.
- a scoring process could be performed for each entity or combinations of entities. For example scoring could be performed for the card and a separate scoring process performed for the account comprising multiple cards. Still further a scoring process could be done for a ZIP code location (e.g., generating a fraud score for a ZIP location for all of the credit card transactions that have occurred within a ZIP location).
- the multi-entity repository may or may not have a hierarchical structure.
- a hierarchy could be multiple cards being associated with an account and another example could be multiple terminals with a single merchant.
- the system could look at all those hierarchies at once. In this manner by examining different entities within a hierarchy, fraud at different levels can be examined at the same time. For example a bank can determine whether fraud is localized only for a particular card or is more pervasive and extends to the merchant or to the customer's other financial instruments such as the customer's checking account.
- Signatures can be used within the system in order to help store detailed, unaltered history of the account/entity.
- the signatures provide a complete picture of the account, allowing on-demand scoring, and not just transaction-triggered scoring.
- the signature allows real-time use of variables which depend upon detailed information for a number of previous transactions, for example, distances (e.g., Mahalanobis distances) between recent and past transactions.
- Signatures may look different for one person versus another person. For example for a particular type of information, fifteen generations of information might be stored for a first person whereas only six generations of the same type of information for a second person might be stored. This could occur, for example, if the first person utilizes their card many more times per month than the second person.
- Signature records can be retrieved for one or more entities depending upon which entities need to be scored as well as which signature records are needed for scoring a particular entity.
- a scoring process may be configured to score a credit card holder's account only by utilizing the one or more signature records associated with that credit card holder.
- another scoring process could be configured to score a credit card holder's account based not only upon that entity's signature records but also based upon one or more other entities' signature records (e.g., a merchant or terminal ID signature record).
- FIG. 11 shows at 550 that the determination of the number of generations (e.g., the relevant time periods) to store for a particular field for a type of transaction can be based upon statistical analysis 560 .
- Statistical analysis 560 can analyze test raw data 562 and determine how much history (e.g., an optimal amount or approximate thereto) of raw data is needed for the application to perform well. For example a history of three months can be selected for a particular field for a particular transaction type. Analysis can be performed on the historical data to determine whether a significant change had occurred in the data the previous week versus over the previous three months.
- Statistical analysis techniques that help analyze the variability can include using mean, standard deviations, skewness, statistical distances, correlation between fields, etc.
- the analysis techniques can also be more sophisticated by creating models that examine variability.
- FIG. 12 depicts at 600 an approach to determine storage rules for a system.
- Statistical analysis 610 is performed upon the entire test raw data set 612 , and analysis results 614 are generated thereby.
- Statistical analysis 620 is performed upon a candidate subset 622 of test raw data (e.g., only the previous two weeks of raw data instead of the entire six months of data). Analysis results 624 from the candidate subset 622 are compared via process 630 with the results generated from the full set. If the difference between the two sets of results is acceptable as determined at 640 , then the storage rule is generated and stored at 650 with the time period information associated with the candidate subset. If it is not acceptable, then another candidate subset can be examined 660 .
- the analysis techniques can be supplemented based upon any experience that a person has with respect to one or more data fields.
- a person can recognize from experience that storage of more then six months for a particular data field is not needed in order to provide any greater predictive capability.
- a domain expert can provide an initial estimate as to what the longest period of time for the data or a data field should be and the domain expert could also indicate an initial estimate for what the expert considers to be the number of generations that should be stored in the raw data repository.
- an expert based upon his or her experience believes that only three months of information is needed for a particular data field. The expert in this situation can indicate that the statistical analysis technique or techniques should evaluate six months of data for that data field and that the techniques should evaluate whether a good or optimal point in time might be the storage of three months of data.
- FIG. 14 illustrates that the storage of information in its raw form makes the system much less application-specific.
- FIG. 14 shows that in addition to fraud detection 480 , information in the raw data repository 470 can also be used by other applications, such as by a loan risk analysis application 700 or an application 710 that examines the revenue expected to be generated from this account holder over a prolonged period of time. In this way a financial institution only has to provide the information to the analysis system once instead of having to provide the same information multiple times for each of the different applications.
- the data that is retrieved from a data store for use in an application such as fraud detection may have missing data values.
- a system 750 can be configured that has missing value imputation processing capability 760 .
- Process 760 can fill in values that are missing from a retrieved data set 762 .
- Missing values can occur because in practice an entity (e.g., a financial institution) supplying the raw data may not have all of the information regarding the transaction. As an illustration, one or more data items associated with a point-of-sale transaction might be missing.
- the missing value imputation process 760 attempts to determine the missing value(s) at 764 .
- Current approaches typically use a most common value approach (e.g., mean, mode, median) in place of a missing value.
- the system of FIG. 15 uses a closed form equation 770 to determine what value (e.g., optimal value) with respect to a target should be used for a missing value.
- the optimal value provides more information with respect to whether fraud has occurred or not. It should be noted that this approach can be utilized for many different applications other than fraud detection, such as determining credit worthiness for a loan applicant. If the system is configured with a raw data repository, the optimal values can be determined for different applications because the raw data is stored in the repository.
- missing values can also occur and thus a closed form equation or lookup table (whose values are based upon the closed form equation) can be used to supply missing values.
- the system can use an approach wherein irrespective (e.g., independent) of the feature an equation is used to calculate the missing value. For example and as illustrated in FIG. 16 , if the transaction amount is missing, then a closed form equation is used in the model building phase 780 to determine the missing transaction amount value for use in building model 782 . In the production phase a lookup table 792 is created via process 790 and used to supply the missing transaction amount value. It should be understood that any value type can be supplied, such as continuous values (e.g., a numeric transaction amount).
- the missing value determination process uses the tag information that accompanies the missing value in order to determine the missing value.
- the tag information would indicate whether there was fraud or no fraud associated with the input data.
- the values that are supplied for the missing values are used to help train or test one or more models 782 that are under construction.
- the values supplied for the missing values are used as part of the input layer to the predictive model for the application at hand.
- a closed form equation 770 is generated via process 800 for a data feature based upon historical data by using an optimality criterion involving the tag information.
- the correlation can be examined via a linear or nonlinear relationship.
- FIG. 18 provides at 850 an illustration wherein if there are six values and G is assigned a value of one and B is assigned a value of zero, then if one or more values are missing from the input data set, then the optimality criterion could be used for determining what value of “x” would maximize the correlation with respect to the tag information.
- FIG. 19 depicts a training approach which addresses fraud on an account-level fashion, thereby allowing fraud-level scores to be generated independent of the existence (or lack thereof) of transactions.
- the approach provides a holistic view of the account/customer and identifies when an account is in a compromised state, as opposed to merely detecting some fraudulent transactions.
- This can provide value to card issuers as account-level fraud (e.g., account takeover, identity theft) has ramped up in the recent past, as compared to traditional lost/stolen card fraud.
- a family of models can be generated simultaneously that optimally balance the additional benefit due to the added complexity and the added computational/operational cost.
- training data is received at process 900 for training a predictive model.
- a model can be trained with the entire data set and thus not partitioned by process 900 .
- the trained model then is scored at process 902 and evaluated at process 904 . If the evaluated trained model has performed satisfactorily as determined at 906 , then the model as defined through the training is made available at 908 .
- the training data set is partitioned by the generator process block 900 .
- the generator process block 900 determines how the training data should be split and modeled by separate and distinct predictive models.
- FIG. 20 illustrates at 950 a partitioning that could occur via the generator process block.
- a data set 952 (e.g., an initial data set) is partitioned into multiple data subsets 954 (e.g., data subset A and data subset B).
- the partitioning can be performed such that the combination of data subset A and data subset B would be the initial data set. If another iteration is required, then further partitioning can be performed, such as generating data subsets C, D and if needed E, and F (as shown at 956 ). It should be understood that if needed these generated subsets can be further partitioned, such as partitioning F into data subsets G, H, and I.
- the training could be performed in many different ways, such as the approach depicted at 1000 in FIG. 21 .
- a mathematical model is constructed iteratively by training and combining multiple, potentially heterogeneous, learning machines.
- the individual “learners” are trained with emphasis on different and overlapping regions of interest. These regions, which can be constantly evolving, are determined by a partition generator.
- an initial model is trained at 1030 using exemplars that are partitioned from the entire training space.
- a data set 1010 is partitioned at 1020 in accordance with partitioning criteria 1012 .
- the partitioning criteria 1012 is based upon minimization of a ranking violations metric.
- An objective function based upon minimization of a ranking violations metric is manipulated so as to minimize the area under the ROC (receiver operating characteristic) curve.
- An ROC is a graph showing how the false rejection rate versus true rejection rate vary (e.g., it examines the percentage of good versus the percentage of bad). The area under the curve is what is to be maximized.
- a learning machine that yields distinctive rankings for each exemplar such as a neural network, a support vector machine, a classification tree, or a Bayesian network, can be used.
- a partition generator then uses a selection function to be applied on the training exemplars, in order to direct attention to a subset of accounts and transactions that could improve the decision performance in the operational region of interest.
- the selected function can be shaped so as to further emphasize the region of interest.
- One or more learning machines are then trained and combined to form a final model as explained below.
- one or multiple learning machines of various types, as well as one or multiple instances of each type of learning machine with different initial conditions are trained with emphasis on potentially different subsets of exemplars.
- the generator searches through possible transformations on the resulting machines and selects at 1040 the one that performs best for account-level fraud detection when combined with the previously selected models and the selected models are weighted at 1050 .
- the generator directs its attention to the selection of a different subset of exemplars.
- the degree of attention/emphasis to each exemplar is determined by how the exemplar assists in or hurts the detection of the account being in a fraudulent state as compared to the new learning machine being disabled (i.e., not included in the overall system).
- a weight of zero or one can be assigned to each exemplar and thus form a “hard” region.
- a continuous range of weights can be used (e.g., all real values between 0 and 1) to create a “soft” region which could avoid training instabilities.
- the entire system including the individual “learners” as well as their corresponding regions of interest evolves and changes with each iteration.
- a ranking violations metric can take into account that a model should produce scores that result in non-fraudulent accounts or transactions being ranked lower than a fraud accounts or transaction.
- FIG. 22 shows at 1100 example scoring results.
- the model(s) that generated the scores on the left side of the table contain multiple ranking violations.
- the model(s) that generated the scores on the right side of the table signify an improvement in predictive capability because the non-fraudulent accounts or transactions are ranked lower than fraud accounts or transactions.
- FIGS. 23-25 provide another example for training a model.
- the entire set of training data is retrieved at 1200 .
- one from a set of candidate predictive models 1212 e.g., a neural network model, decision tree model, linear algorithm model, etc.
- a predictive model is trained with the training data.
- the trained one or more predictive models are scored and evaluated at 1230 .
- the evaluation can be based on a cost/error function. Some examples are the area under the ROC curve, $ savings, etc.
- the evaluation can be done at the account level. For example, an evaluation can examine the extent to which non-fraudulent accounts were ranked higher than fraudulent accounts and to what degree. However it should be understood that other levels could be used, such as having the ranking at the observation/transaction level, at the business problem level, etc.
- a test for convergence is performed at 1240 . If there is convergence, then the model is considered defined (e.g., the parameters of the one or more predictive models are fixed) at 1242 . However if convergence has not occurred, then processing continues on FIG. 24 so that the generator process can determine at 1250 how to split one or more data sets in order to train one or more new predictive models.
- the generator process determines whether the training data set 1252 should be split and modeled by separate and distinct predictive models. This allows for an automatic determination as to whether the entire training data set should be used or whether it is better to use subsets of the data for the training.
- the generator process can determine which data subset(s) (e.g., 1254 , 1256 ) have been most problematic in being trained.
- a new predictive model will be trained to focus upon that problematic data set.
- One way to perform this is for the generator to assign greater importance to the problematic data and lesser importance to the data that can be adequately explained with the other predictive models that have already been trained.
- the weighting of importance for data subsets is done at the account level.
- a second predictive model is selected at 1260 from candidate predictive models 1212 and trained at 1270 .
- the second model and the first model are combined at 1280 , and evaluation of the combined models' results 1292 occurs at 1290 . If the combined models do converge as determined at 1300 , then the combined models are provided as output at 1302 . If the combined models do not converge as determined at 1300 , then the data is further split as shown in FIG. 25 so that another model can be trained.
- the generator process determines at 1306 how to split one or more data sets in order to train one or more new predictive models. For example, the generator process can determine whether the training data set 1256 should be split and modeled by separate and distinct predictive models. More specifically, the generator process can determine which data subset(s) (e.g., 1308 , 1310 ) have been most problematic in being trained, and a new predictive model will be trained to focus upon that problematic data set. One way to perform this is for the generator to assign greater importance to the problematic data and lesser importance to the data that can be adequately explained with the other predictive models that have already been trained. The weighting of importance for data subsets is done at the account level.
- data subset(s) e.g., 1308 , 1310
- a third predictive model is selected at 1312 from candidate predictive models 1212 and trained at 1320 .
- the third model and the other models are combined at 1330 , and evaluation of the combined models' results 1342 occurs at 1340 . If the combined models do converge as determined at 1350 , then the combined models are provided as output at 1360 . If the combined models do not converge as determined at 1350 , then the data is further split.
- the system can examine how many ranking violations have occurred.
- the evaluation also examines whether there is any improvement (e.g., decrease) in the number of ranking violations from the previous iteration.
- the convergence decision step determines whether the number of ranking violations is at an acceptable level. If it is, then the defined models are made available for any subsequent further model development or are made available for the production phase. However if the number of ranking violations is not at an acceptable level, then further partitioning occurs at the partition process block.
- the resultant model may be a single predictive model and/or multiple predictive models that have been combined.
- combined predictive models resulting from the training sessions can be homogenous or heterogeneous.
- a homogenous combined set of predictive models two or more neural networks can be combined from the training sessions.
- a heterogeneous combined set of predictive models a neural network model can be combined with a decision tree model or as another illustration multiple genetic algorithm models can be combined with one or more decision tree models as well as with one or more linear regression models.
- the evaluation block assesses the performance of the combined models. The models are placed in parallel and their outputs are combined.
- a predictive system can be configured to generate reason codes which aid in understanding the scores.
- Current methodology for producing reason codes for explaining the scores is not very useful, as it typically identifies variables that are similar in nature as the top three reason codes. This does not provide valuable insight into why a particular item scored high. This becomes important as the industry moves away from transaction-level scores for fraud detection to account-level scores identifying the account's compromise. Due to the more complex underlying phenomena, more refined and meaningful reason codes become tantamount.
- FIG. 26 depicts a reason code determination process 1430 that is used to create reason codes 1440 for a scoring system/predictive model 1410 .
- reason codes can be used for other applications other than fraud detection.
- the reason code determination process 1430 can be configured having reason code technology that is based on risk factors/groups rather than individual variables used in the models.
- the reason codes 1440 for a scoring system are to provide insight to end users with respect to the score generated by the predictive model 1440 (e.g., the fraud analysis results/scores 1420 ). It provides guidance in reviewing the scoring entity and making appropriate actions and decisions. In case of fraud detection models, the reason codes provide directions for the initial investigations/review of suspect cases.
- the process 1430 can provide statistically-sound explanations of the results 1420 (e.g., score) from a scoring system/predictive model 1410 that is analyzing certain input data 1400 . Also, the explanations are factor based and can be used within a scoring system such as to satisfy requirements for credit scoring models under regulation B.
- FIG. 27 illustrates how reason codes can be built. Instead of using individual input variables as reasons, first reason factors are generated via process 1450 by grouping variables that have similar concept. Analytic techniques are used to generate these reason factors. The reason factors could then be reviewed and refined by domain experts. Once the reason factors are formed, the importance of reason factors to the score can be constructed. The reason codes can be generated by rank ordering the “importance” of each reason factor. Finally, the performance of the reason codes is evaluated. Based on the results, one can also revise the reason generator by iterating the process. The generated reasons are provided as reason configuration data 1460 to a reason determination process 1430 for use in the scoring/predicting process.
- FIG. 28 illustrates creation of reason codes.
- individual variables are not used as reasons in order to avoid top reasons providing the same information. Instead, the system first groups variables that are correlated and with similar concept into different reason factors. As an illustration, the variables that relate to the time when the transaction occurred can be grouped separately from variables that do not relate to that. Many statistical techniques can be used to group the variables. Each variable group (e.g., a reason factor) can represent a reason code.
- PCA Principal Component Analysis
- Manual review and refinement of the reason factors can be performed in many ways. For example, this could include sanity checking the variable grouping as well as creating user-friendly description for each variable factor.
- the groupings may be revised/refined based on domain expertise and previous results, and the configuration file is prepared for the reason code generator step.
- a reason code generator is created at 1520 by constructing and measuring the importance for each reason factor to the score.
- the importance can be defined in many ways. It can be (1) the strength of the reason factor to explain the current score; (2) the strength of the reason factor to make the score high; etc. The importance of these reason factors are then rank-ordered. The top most important factors are reported as the score reasons. The number of reasons will be based on the business needs.
- each reason factor to the score could be measured by a tree model at 1540 .
- These tree models are constructed at 1550 to extract the correlations between the values of the variables in the given reason factor and the score, and they can be built at 1560 using a SAS Enterprise Miner tree procedure/SAS PROC ARBORETUM available from SAS Institute located in Cary, N.C.
- SAS Enterprise Miner tree procedure/SAS PROC ARBORETUM available from SAS Institute located in Cary, N.C.
- the performance of the reason code is then analyzed at 1530 .
- the general performance of the reason code generator is reviewed.
- Other items used in analyzing the performance of the reason code could include:
- the systems and methods disclosed herein may be used for many different purposes. Because the repository stores raw data (as opposed to derived or imputed information), different types of analysis other than credit card fraud detection can be performed. These additional analyses could include using the same data that is stored in the repository for credit card analysis to also be utilized for detection of merchant fraud or for some other application. Derived or imputed information from raw data does not generalize well from one application (e.g., cardholder account fraud prediction) to a different application (e.g., merchant fraud prediction).
- one application e.g., cardholder account fraud prediction
- a different application e.g., merchant fraud prediction
- FIGS. 30 and 31 depict that a view selector module 1600 can be provided that allows a user or computer program to select which entity (e.g., a particular merchant) or type of entity (e.g., all merchants) upon which they would like fraud analysis to be performed. For example a user or a computer application can shift the view from whether a cardholder account has been compromised to whether fraud has occurred at one or more merchants. The raw data that is used to predict whether fraud has occurred with respect to a cardholders account can also be utilized to predict whether fraud has occurred at a merchant location.
- entity e.g., a particular merchant
- type of entity e.g., all merchants
- Other types of analysis can also be performed because the data is stored in its raw form, such as merchant attrition.
- Merchant attrition is when an institution loses merchants for one or more reasons.
- a merchant attrition score is an indicator of how likely a relationship between a merchant and an institution will be severed, such as due to bankruptcy of the merchant or the merchant porting its account to another institution (e.g., to receive more favorable terms for its account).
- the raw data in the repository 272 that can be used for fraud scoring can also be used in combination with other data to calculate merchant attrition scores.
- the other data could include any data (raw or derived) that would indicate merchant attrition.
- a system can be configured with a selector 1610 that selects what type of analysis should be performed.
- FIG. 32 depicts the storage of raw data (e.g., 1702 , 1712 , 1722 ) at or from different institutions ( 1700 , 1710 , 1720 ) which provides for an expanded, more comprehensive and quicker type of analysis to occur. Because each institution is not storing its information as a set of derived or calculated set of values that are typically application-specific, the raw data from each of these repositories can be retrieved and used together in order to provide a more robust predictive capability. As shown in FIG. 32 , data is collected from the repositories ( 1702 , 1712 , 1722 ) at 1730 . A view selector 1740 could be used as described above for selecting a particular entity type or analysis type for processing.
- a view selector 1740 could be used as described above for selecting a particular entity type or analysis type for processing.
- Predictive model(s) at 1750 generate the predictions, such as an entity score 1760 .
- the score can be at multiple different levels (e.g., fraud scoring at the card holder level, fraud scoring at the merchant level, merchant attrition score, bankruptcy prediction scoring, etc.)
- the system can still utilize this data from the different institutions ( 1700 , 1710 , 1720 ) because raw data is being stored. This can be helpful if the raw data repositories are in a distributed environment, such as at different sites.
- the repositories may be at different sites because of a number of reasons. A reason may be that an institution would like to have its repository at one of its locations; or different third-party analysis companies may have the repositories on behalf of their respective institutions. For example a third-party analysis company receives data from a first institution (e.g., a Visa institution) and applies its storage rules when storing information for the first institution while a different third-party analysis company receives data from a different institution (e.g., a MasterCard institution) and applies its storage rules when storing information for the other institution. Although different third-party analysis companies with their unique storage rules have stored the data from different institutions, the raw data from the different repositories can still be collected and used together in order to perform predictions.
- a third-party analysis company receives data from a first institution (e.g., a Visa institution) and applies its storage rules when storing information for the first institution while a different third-party analysis company receives data from a different institution (e.g., a MasterCard institution) and
- the raw data in the repository associated with that merchant can be retrieved from the repository in order to determine whether fraud can be detected for other credit card accounts that have purchased goods or services at that merchant's location.
- the fraud rate at the merchant's location could be 0.1% but now after evaluating other credit cards from different institutions that have been utilized at the merchant's location, the fraud rate is now 10%.
- a merchant's fraud score can be used to determine whether a credit card has been compromised.
- processing can entail analyzing the raw data associated with the credit cards utilized at a merchant's location to generate a score for the merchant and then using that score to analyze an account whose credit card had recently been used at the merchant's location.
- FIGS. 33 and 34 show at 1800 a system that integrates different aspects disclosed herein.
- a predictive model is built using development data 1802 .
- Development data 1802 e.g., cycle cut data, authorizations data, payment data, non-monetary data, etc.
- the development data 1802 is stored in the raw data repository 1810 which has a manager 1812 that helps manage the raw data repository 1810 , such as handling the updating of the raw data repository 1810 with new development data.
- the raw data from the repository 1810 could also be utilized to create at 1820 static behavior tables.
- the data in the static behavior tables provides a global picture which is the same for a period of time (e.g., static or does not change dramatically over a period of time).
- These types of variables are useful in identifying the occurrence of fraud.
- An example of such variables include risk with respect to a geographical area.
- the information created for these tables do not have to be changed in production, whereas the transaction information in the repository do change once in production to reflect the transaction that is occurring while the system is in production.
- Signature records are retrieved from the repository 1810 and features from the raw data are derived at.
- a signature is an account-level compilation of historic data of all transaction types. Signatures help a model to recognize behavior change (e.g., to detect a trend and deviation from a trend). There is one record stored for each account. Length of history of each type of data may vary. Signature data is updated with every new transaction. The features are also derived based upon the behavior tables.
- the system analyzes the retrieved data in order to distill it down to a more manageable size by deriving features on-the-fly (in RAM) associated with the retrieved data.
- the optimal missing value imputation process 1850 fills in values that are missing from the retrieved data set. Missing values can occur because in practice the entity (e.g., a financial institution) supplying the raw data may not be able to provide all of the information regarding the transaction. As an illustration, one or more data items associated with a point-of-sale transaction might be missing.
- the missing value imputation process block determines the optimal missing value.
- the automated feature reduction process 1860 eliminates unstable features as well as other items such as features with similar content and features with minimum information content. As an illustration, this process could eliminate such unstable features that, while they may be informative, change too dramatically between data sets. Features with similar content may also be eliminated because while they may be informative when viewed in isolation are providing duplicate information (e.g., highly collinear) and thus their removal from the input data set does not significantly diminish the amount of information contained in the input data set. Contrariwise the process preserves the features that provide the most amount of information. Accordingly, this process reduces the number of variables such that the more significant variables are used for training. The generated reduced feature data set is provided as input to the model generation process.
- a predictive model is trained with training data which in this example is the data provided by the automated feature reduction process 1860 .
- the predictive models are trained using error/cost measures.
- all accounts are scored using all networks during model building.
- Resulting errors are used by the generator process 1870 to intelligently rearrange the segments and retrain the models.
- the generator process 1870 determines whether the training data should be split and modeled by separate and distinct predictive models. This allows an automatic determination as to whether the entire training data set should be used or whether it is better to use subsets of the data for the training.
- the trained one or more predictive models are scored at process 1880 .
- the scores are then evaluated at process 1890 .
- a test for convergence is performed at 1990 . If there is convergence, then the model is considered defined (e.g., the parameters of the one or more predictive models are fixed). However if convergence has not occurred, then processing returns to the generator process block 1870 in order to determine how to split one or more data sets in order to train one or more new predictive models. For example the generator process 1870 determines which data subset(s) have been most problematic in being trained. A new predictive model is trained to focus only upon that problematic data set.
- the result of the training process is that a complete predictive model has been defined (e.g., the parameters are fixed).
- the scoring operation that is performed at 1910 after the model definition is done for the purposes of the reason code generator 1920 .
- the reason code generator 1920 uses the scores generated by the scoring process 1910 .
- the reason code generator process 1920 examines the account scores and is configured to provide one or more reasons for why an account received a particular score.
- an evaluation 1930 is performed again for the account scores. At this point processing could loop back to process 1830 to derive features from raw data and the behavior tables; or after evaluation of the reason code generation process by the evaluation process, then the development phase of the predictive model can be deemed completed.
- the generated model and reason codes can be used to score accounts and provide reasons for those scores.
- the scoring process can be triggered by receipt of a new transaction or upon demand, such as based upon a random trigger.
- the trigger would signal that relevant records from the raw data repository 1810 should be retrieved and processed (e.g., missing value imputation processing, etc.).
- the resultant data would be the input to the trained model in order to generate scores and reason codes.
- systems and methods may be implemented on various types of computer architectures, such as for example on a single general purpose computer or workstation, or on a networked system, or in a client-server configuration, or in an application service provider configuration.
- systems and methods may include data signals conveyed via networks (e.g., local area network, wide area network, internet, etc.), fiber optic medium, carrier waves, wireless networks, etc. for communication with one or more data processing devices.
- the data signals can carry any or all of the data disclosed herein that is provided to or from a device.
- the methods and systems described herein may be implemented on many different types of processing devices by program code comprising program instructions that are executable by the device processing subsystem.
- the software program instructions may include source code, object code, machine code, or any other stored data that is operable to cause a processing system to perform methods described herein.
- Other implementations may also be used, however, such as firmware or even appropriately designed hardware configured to carry out the methods and systems described herein.
- the systems' and methods' data may be stored and implemented in one or more different types of computer-implemented ways, such as different types of storage devices and programming constructs (e.g., data stores, RAM, ROM, Flash memory, flat files, databases, programming data structures, programming variables, IF-THEN (or similar type) statement constructs, etc.).
- storage devices and programming constructs e.g., data stores, RAM, ROM, Flash memory, flat files, databases, programming data structures, programming variables, IF-THEN (or similar type) statement constructs, etc.
- data structures describe formats for use in organizing and storing data in databases, programs, memory, or other computer-readable media for use by a computer program.
- the systems and methods may be provided on many different types of computer-readable media including computer storage mechanisms (e.g., CD-ROM, diskette, RAM, flash memory, computer's hard drive, etc.) that contain instructions for use in execution by a processor to perform the methods' operations and implement the systems described herein.
- computer storage mechanisms e.g., CD-ROM, diskette, RAM, flash memory, computer's hard drive, etc.
- a module or processor includes but is not limited to a unit of code that performs a software operation, and can be implemented for example as a subroutine unit of code, or as a software function unit of code, or as an object (as in an object-oriented paradigm), or as an applet, or in a computer script language, or as another type of computer code.
- the software components and/or functionality may be located on a single computer or distributed across multiple computers depending upon the situation at hand.
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- General Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Marketing (AREA)
- Entrepreneurship & Innovation (AREA)
- Technology Law (AREA)
- Game Theory and Decision Science (AREA)
- Human Resources & Organizations (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Computer Security & Cryptography (AREA)
- Data Mining & Analysis (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
Systems and methods for performing fraud detection. As an example, a system and method can be configured to contain a raw data repository for storing raw data related to financial transactions. A data store contains rules to indicate how many generations or to indicate a time period within which data items are to be stored in the raw data repository. Data items stored in the raw data repository are then accessed by a predictive model in order to perform fraud detection.
Description
- This application claims priority to and the benefit of U.S. Provisional
- Application Ser. No. 60/786,038 (entitled “Computer-Implemented Data Storage For Predictive Model Systems” and filed on Mar. 24, 2006), of which the entire disclosure (including any and all figures) is incorporated herein by reference.
- This application is a divisional of U.S. patent application Ser. No. 11/691,277 (entitled “Computer-Implemented Data Storage Systems and Methods for Use with Predictive Model Systems”), of which the entire disclosure (including any and all figures) is incorporated herein by reference.
- This application contains subject matter that may be considered related to subject matter disclosed in U.S. Provisional Application Ser. No. 60/786,039 (entitled “Computer-Implemented Predictive Model Generation Systems And Methods” and filed on Mar. 24, 2006), and to U.S. Provisional Application Ser. No. 60/786,040 (entitled “Computer-Implemented Predictive Model Scoring Systems And Methods” and filed on Mar. 24, 2006), of which the entire disclosures (including any and all figures) of these applications are incorporated herein by reference.
- This document relates generally to computer predictive models and more particularly to constructing and using computer predictive models.
- Computer predictive models have been used for many years in a diverse number of areas, such as in the financial industry. However current methods have difficulty in providing an automated or semi-automated mechanism for determining whether a suspicious activity, such as credit card fraud, may have occurred. As an illustration, previous systems experience problems in generating fraud indicative scores because such systems generally store aggregated/derived data and not raw data, thereby losing relevant history associated with an entity to perform scoring. Moreover, aggregated/derived data is specifically suited for a particular application and purpose (e.g., a fraud scoring purpose), but lacks flexibility to readily be used by other types of scoring applications.
- In accordance with the teachings provided herein, systems and methods for operation upon data processing devices are provided for performing fraud detection. As an example, a system and method can be configured to contain a raw data repository for storing raw data related to financial transactions. A data store contains rules to indicate how many generations or a time period within which data items are to be stored in the raw data repository. Data items stored in the raw data repository are then accessed by a predictive model in order to perform fraud detection.
-
FIG. 1 is a block diagram depicting a computer-implemented system for generating and using predictive models to assess whether fraudulent activity may have occurred. -
FIG. 2 is a block diagram depicting examples of input data. -
FIG. 3 is a graph showing an account compromise period. -
FIG. 4 is a block diagram depicting use of non-monetary information. -
FIG. 5 is a block diagram depicting a system being configured to produce a score even in the absence of a current or new transaction on the account. -
FIGS. 6 and 7 are time line graphs showing a transaction time line and a scoring trigger time line. -
FIG. 8 is a block diagram depicting examples of client-defined events. -
FIG. 9 is a block diagram depicting a system for storing information for use in fraud detection. -
FIG. 10 is a block diagram depicting storage of fields within a data storage system. -
FIG. 11 is a block diagram depicting a determination of the number of generations to store for a field. -
FIG. 12 is a block diagram depicting an approach to determine storage rules for a system. -
FIG. 13 is a block diagram depicting another approach to determine storage rules for a system. -
FIG. 14 is a block diagram depicting storage of information in its raw/unprocessed form. -
FIGS. 15-18 are block diagrams depicting systems configured with missing value imputation processing capability. -
FIG. 19 is a flowchart depicting a training approach to address fraud on an account-level fashion. -
FIG. 20 illustrates a data partitioning example. -
FIG. 21 is a block diagram depicting an iterative training approach. -
FIG. 22 illustrates example scoring results. -
FIGS. 23-25 provide another example for training a model. -
FIG. 26 is a block diagram depicting a reason code determination process that can be used to create reason codes for a scoring system/predictive model. -
FIG. 27 is a block diagram depicting construction of reason codes. -
FIG. 28 is a flowchart depicting construction of reason codes. -
FIG. 29 is a flowchart depicting importance of a reason factor to a score. -
FIGS. 30-32 are block diagrams depicting a view selector module that allows a user or computer program to select an entity or type of entity for analysis. -
FIGS. 33 and 34 are block diagrams depicting an integrated system for fraud analysis. -
FIG. 1 depicts at 30 a computer-implemented system for generating and usingpredictive models 34 to assess whether fraudulent activity may have occurred. Accurate detection of fraud is important in that it results in action being taken earlier and in a more effective manner for addressing the fraudulent activity. An action could include for example whether a credit card company should personally investigate if fraud may have occurred with respect to a particular credit card holder. It should also be understood that thesystem 30 can be configured to process one entity or many entities. - As shown in
FIG. 1 ,input data 32 is used during a development phase to create/train one or morepredictive models 34. Thepredictive models 34 are then used during a production phase to receiveinput 32, such as from a financial institution, to generatefraud analysis results 36. - Whether in the development phase or in the production phase, the
input data 32 can be of many different types. Examples ofsuch input data 32 are shown inFIG. 2 . With reference toFIG. 2 , afraud detection model 100 can receivefraud information 102 andother input information 104, such as posting/transaction information, authorization information, cycle cut information, etc. - An example of fraud data could be the date of the first fraud as reported by a customer. For example, a customer may call a financial institution to indicate that one or more transactions that appeared on their credit card statement and represent a fraudulent use. An example of fraud is when a person steals a credit card number and uses it to purchase items.
- The input fraud data can include several dates, such as the date on which fraud was first suspected to have occurred and a block date which is the date on which no further transactions should be authorized. A
predictive model 100 can be trained to detect fraud (e.g., whether an entity has been compromised as shown at 110) within this account compromise period as early as possible. - The fraud data can be one record per account or multiple records per account. For example and as illustrated at 150 in
FIG. 3 , the data could have one record for each compromised account that identifies the beginning of the compromised period and the end of the compromised period. The compromised period may include both fraudulent as well as non-fraudulent transactions. This mixture is acceptable because the predictive model is trained not to detect fraud for a particular transaction, but whether an account should be deemed as having been compromised. Account-level fraud detection is a preferred approach over a transaction-based system because most financial institutions are more interested in whether an account has been compromised in order to stop the “bleeding” (e.g., reduce the amount of fraud) and not whether a particular transaction is fraudulent or not. - With reference back to
FIG. 2 , the system can also utilizepayment information 106 and/ornon-monetary information 108 to detect whether the account is in a fraud state. An example of payment information is the credit card payment information received on a monthly or other periodic basis. An example of non-monetary data is an address change, phone change, mother name change, or credit line change. Still further, another data feed could be postings. Postings are the process for the recording of debits and credits to individual cardholder account balances. - As illustrated at 200 in
FIG. 4 ,non-monetary information 108 is provided regarding anentity 202 that has a relationship with a financial institution such as abank 204. Theentity 202 itself can be at different levels. These levels could be but are not limited to thecard level 210,customer level 212, oraccount level 214. Using such information, the fraud detection process determines whether theentity 202 has been compromised (e.g., whether fraud has been detected). -
FIG. 5 depicts at 250 that a system can be configured to produce ascore 276 even in the absence of a current or new transaction on the account (e.g., independent of whether a transaction-type event has occurred), which is an aid in the efficient use of resources to manage fraud cases. This is in contrast to most of today's fraud detection systems which only produce a score when a transaction (typically authorization) comes through the system. This is not particularly useful for managing case queues efficiently, when the fact that no additional transaction occurred during a certain time period could represent additional information that would be very useful in actively managing fraud. - However, the system can also be configured such that at any time the system can generate via process 254 a
fraud score 252. This includes generating ascore 252 based upon receiving anincremental transaction 262. This account-level score indicates whether an account is in a compromised state or not.FIG. 6 illustrates at 300 that with scoring on demand, a different score (e.g., at “S2”) might be produced even though only the passage of time had occurred with no new transactional information being received (e.g., “S2” was generated despite a new transaction “T2” not occurring until later). - With reference back to
FIG. 5 , thetrigger 260 is asynchronous with respect to an incremental transaction 262 (e.g., anauthorization transaction 290,non-monetary transaction 292,payment transaction 294, etc.). Generated in response to anon-incremental type event 264, atrigger 260 provides an indicator that records should be retrieved inprocess 270 from arepository 272. The records are to be generated forscoring process 274 for determining ascore 276 as to whether an entity (e.g., an account) has been compromised. The records can be “raw” data (e.g., the actual transaction data received over time) from which features can be derived on-the-fly for use by the predictive model. However it should be understood that the retrieved records could also include derived data. - The
repository 272 is updated viaprocess 280 with every transaction, but a score-on-demand trigger 260 for deriving features is independent of receipt of anincremental transaction 262 and instead is dependent upon receipt of non-incremental information, such as date and time information. Account information is provided in order to specify which account should be scored. Date and time information is provided because just the passage of time may result in a different score. The date and time information can be provided as part of the input by the requestor or can be obtained by other means, such as from a computer system clock. It should be understood that similar to the other processing flows described herein, the steps and the order of the processing steps described herein may be altered, modified and/or augmented and still achieve the desired outcome. - Different types of triggers may be involved in generating a score on demand, such as the process being triggered via a time/random trigger or a client event. An example of a time trigger would be to score one or more accounts at a periodic interval (e.g., every 48 hours) irrespective of whether an incremental transaction has occurred. A random trigger may also be used to randomly detect whether fraud has occurred with respect to one or more accounts.
- For example the score for an account may be 900 but after only a passage of time and with no new transactions, the system might generate a different score such as 723. Such a situation might arise if a legitimate but highly suspicious transaction occurred. Since no transaction occurred over a period of time this lowers the likelihood of the account being in a compromised state. Previous systems would have problems in generating scores that are asynchronous with respect to a transaction occurrence because they generally store aggregated/derived data and not raw data and thus they lose relevant history associated with an account to perform asynchronous scoring. In previous systems, the aggregated/derived data is specifically suited for the scoring application and thus lacks flexibility to readily perform other types of scoring beyond its immediate purpose.
- As another example the analysis process may have detected that three different credit cards were used at the same restaurant at about the same time and one of the credit cards has been determined as being in a compromised state. The scoring process can then be triggered for the other two credit card accounts and the scoring process will factor in the other two credit card accounts' information when generating a score for the third card. Accordingly whenever fraud is detected with respect to a first card, the scoring process can be triggered for any other card issued from the financial institution that was used at the same place or places as the first card.
- As shown at 350 in
FIG. 7 , a client can define an event that would trigger scoring of an account as shown at “S2.”FIG. 8 depicts examples at 400 of client-defined events which could be an event wherein an account's password is changed as shown at 402 or a customer's car is stolen as shown at 404. Amonitoring process 406 can determine when one of these triggers has occurred with respect to an account. These triggers indicate when data is to be extracted from arepository 272. For the different types of triggers, a financial institution can select whether all accounts are to be processed or only certain accounts. - The updating via
process 280 of therepository 272 withincremental transaction information 262 occurs asynchronously with respect to atrigger 260 for generating via process 274 ascore 276 on demand. The scoring can also occur based upon the receipt of a new transaction for an account. - It is noted that an incremental transaction indicates that a transaction has occurred that increases the amount of information with respect to an account (e.g., increases information resolution). An example of this type of transaction could be a purchase event wherein an authorization is requested for money to be subtracted from an account. A non-incremental event is one where no additional information is available relative to an entity other than that there has been a passage of time. A non-incremental event can then act as a trigger that is asynchronous with respect to whether an incremental transaction has occurred or not.
- This time-passage-only type of trigger is useful to an account that may be considered on the cusp or edge (e.g., is the entity going to fall fraudulent or non-fraudulent). For example a cardholder's automobile is reported as stolen. In such situations a credit card or debit card may also have been stolen and usually large dollar amount transactions are recorded within the first couple of hours after the vehicle is stolen. The system can generate a trigger every fifteen minutes for the next three hours to score the account irrespective of whether a purchase transaction has occurred. The first scoring may have a higher score because it is closer in time to when the car was reported as stolen, but each subsequent scoring within the three hour window wherein no incremental transactions has occurred can see lower scores.
- As another example a fraud analyst arrives at work in the morning with a queue of accounts to analyze. The question confronting the analyst is which account the analyst should consider first. The analyst sees that scoring on these accounts has not occurred since last night. The analyst then sends a request that these accounts should be scored again. For one or more of the accounts there may have been no additional transactions since the previous night but they may receive a different score just based upon the passage of time since the previous night. The new scoring may reorder the queue (which would alter the order of accounts the analyst is to process, such as by calling customers).
-
FIG. 9 depicts at 450 a system for storing information for use infraud detection 480. The system ofFIG. 9 stores theraw data 452 instead of derived feature information which is used in a typical current system. The typical current system's storage approach creates problems because there may be a need to view recent transactions in context of the account's or card's past history. Ideally, a significant portion of the raw historical transactions could be included for each score. However, for real-time systems, this has proven to have an unacceptable impact on throughput and response time. Alternative schemes involve saving only summarized information. While this does reduce the throughput, it also limits the types of variables and the level of detail available to the model. - In contrast, the system of
FIG. 9 contains a repository of historical data. This is not aggregate or derived data butraw data 452. For example no summaries or averages of raw transactional data is stored in therepository 470.Raw data 452 is being processed and stored viaprocess 460 and then retrieved (e.g., by fraud detection process 480) in order to determine whether an entity has been compromised. In other embodiments, a combination of raw data and derived data can be stored. - In the system,
storage rules 454 specify how many generations ofraw data 452 should be stored in therepository 470. This determination could include how many raw payment amounts should be stored. The determination of how many generations should be stored is based upon the type of transaction as well as the transaction fields. This may result in varying lengths of the fields being stored in therepository 470 as illustrated at 500 inFIG. 10 . For example, the payment amounts for the last seven transactions may be stored in the repository. However for another type of information, only the previous five values need to be stored. Thus the length for one field might be seven generations in length, whereas for another field, only five generations in length might be stored in the repository. An advantage of storage of the raw data (in comparison with storage of aggregate or derived data) is that information that underlines the transaction is not lost due to process that may preserve only a top-level view of what has occurred. As an example of a storage rule, a storage rule can specify how many authorization amounts should be stored for an entity in the raw state (e.g., without any aggregation or other type of transformation into a derived variable). - The data can be stored in a circular list (e.g., a doubly linked list) for each field. They can comprise varying lengths in the circular lists for the data fields. A data field may have the previous three generations stored, whereas another data field may have the previous eight generations stored. The circular lists are stored in an indexed file. However it should be understood that other storage mechanisms may be utilized such as storage in a relational database.
- It should be noted that the system can still operate even if not all of the generations for a particular data field has been stored. For example a relatively new card may have only enough raw data to store three generations of payment authorization amounts although the storage rules for this data field may allow storage of up to fifteen generations. A predictive model can still operate even though a particular data field does not have all of the generations specified by the storage rules.
- The storage of raw data in the repository reflects a compromise between an ideal situation where all historical information that can be obtained for an entity is stored (that is used to make a prediction) versus the physical constraints of storage capacity and/or performance. In reaching that compromise it should be noted that a less than optimal situation might exist in determining what timeframe/number of generations should be stored for one or more data fields. It should also be noted that storage rules can use the number of generations (e.g., the previous four generations) and/or a particular timeframe (e.g., only the previous three weeks) in determining how much raw data for a particular data field should be stored. For situations where more generations, longer time frames are needed for a particular data field, a multi-resolution scheme can be used. In other words, the storage can store only every k events/transactions where k varies based on the recency of the transactions/events.
- Storage rules dictate how far back in history should data be stored. The history can be at different levels, such as at the transaction level or at another level such as at an individual field level. As an illustration for an authorization the system may receive an authorization amount, a merchant identifier, and a date-time stamp. The system might decide that it does not need the same history for all these different pieces of data, so the system based upon the storage rules stores the past ten transaction amounts but only the previous five merchant identifiers. Thus the buffered lengths for the different data types could vary. Even the same field (e.g., the date-time stamp field) for two different transaction types may have different storage rules. For example for one type of transaction five generations of date-time stamps may be needed but for another type of transaction eight generations may need to be stored.
- The system stores information about different entities and uses the information from multiple entities to determine whether a particular account has been compromised. An entity could be a card and another entity could be an account comprising multiple cards. Another entity could comprise ZIP code. A scoring process could be performed for each entity or combinations of entities. For example scoring could be performed for the card and a separate scoring process performed for the account comprising multiple cards. Still further a scoring process could be done for a ZIP code location (e.g., generating a fraud score for a ZIP location for all of the credit card transactions that have occurred within a ZIP location).
- The multi-entity repository may or may not have a hierarchical structure. A hierarchy could be multiple cards being associated with an account and another example could be multiple terminals with a single merchant. The system could look at all those hierarchies at once. In this manner by examining different entities within a hierarchy, fraud at different levels can be examined at the same time. For example a bank can determine whether fraud is localized only for a particular card or is more pervasive and extends to the merchant or to the customer's other financial instruments such as the customer's checking account.
- Signatures can be used within the system in order to help store detailed, unaltered history of the account/entity. The signatures provide a complete picture of the account, allowing on-demand scoring, and not just transaction-triggered scoring. The signature allows real-time use of variables which depend upon detailed information for a number of previous transactions, for example, distances (e.g., Mahalanobis distances) between recent and past transactions.
- Signatures may look different for one person versus another person. For example for a particular type of information, fifteen generations of information might be stored for a first person whereas only six generations of the same type of information for a second person might be stored. This could occur, for example, if the first person utilizes their card many more times per month than the second person.
- Signature records can be retrieved for one or more entities depending upon which entities need to be scored as well as which signature records are needed for scoring a particular entity. For example a scoring process may be configured to score a credit card holder's account only by utilizing the one or more signature records associated with that credit card holder. However another scoring process could be configured to score a credit card holder's account based not only upon that entity's signature records but also based upon one or more other entities' signature records (e.g., a merchant or terminal ID signature record).
-
FIG. 11 shows at 550 that the determination of the number of generations (e.g., the relevant time periods) to store for a particular field for a type of transaction can be based uponstatistical analysis 560.Statistical analysis 560 can analyze testraw data 562 and determine how much history (e.g., an optimal amount or approximate thereto) of raw data is needed for the application to perform well. For example a history of three months can be selected for a particular field for a particular transaction type. Analysis can be performed on the historical data to determine whether a significant change had occurred in the data the previous week versus over the previous three months. For a particular field the previous three months of raw data might be needed to help capture and explain the variability of that field whereas for another field only the past week might be needed to be captured in order to explain the variability. Statistical analysis techniques that help analyze the variability can include using mean, standard deviations, skewness, statistical distances, correlation between fields, etc. The analysis techniques can also be more sophisticated by creating models that examine variability. -
FIG. 12 depicts at 600 an approach to determine storage rules for a system.Statistical analysis 610 is performed upon the entire testraw data set 612, andanalysis results 614 are generated thereby.Statistical analysis 620 is performed upon acandidate subset 622 of test raw data (e.g., only the previous two weeks of raw data instead of the entire six months of data). Analysis results 624 from thecandidate subset 622 are compared viaprocess 630 with the results generated from the full set. If the difference between the two sets of results is acceptable as determined at 640, then the storage rule is generated and stored at 650 with the time period information associated with the candidate subset. If it is not acceptable, then another candidate subset can be examined 660. - As shown in
FIG. 13 , the analysis techniques can be supplemented based upon any experience that a person has with respect to one or more data fields. As an illustration a person can recognize from experience that storage of more then six months for a particular data field is not needed in order to provide any greater predictive capability. As another illustration and as shown at 670, a domain expert can provide an initial estimate as to what the longest period of time for the data or a data field should be and the domain expert could also indicate an initial estimate for what the expert considers to be the number of generations that should be stored in the raw data repository. For example, an expert based upon his or her experience believes that only three months of information is needed for a particular data field. The expert in this situation can indicate that the statistical analysis technique or techniques should evaluate six months of data for that data field and that the techniques should evaluate whether a good or optimal point in time might be the storage of three months of data. -
FIG. 14 illustrates that the storage of information in its raw form makes the system much less application-specific.FIG. 14 shows that in addition tofraud detection 480, information in theraw data repository 470 can also be used by other applications, such as by a loanrisk analysis application 700 or anapplication 710 that examines the revenue expected to be generated from this account holder over a prolonged period of time. In this way a financial institution only has to provide the information to the analysis system once instead of having to provide the same information multiple times for each of the different applications. - The data that is retrieved from a data store for use in an application such as fraud detection may have missing data values. With reference to
FIG. 15 , asystem 750 can be configured that has missing valueimputation processing capability 760.Process 760 can fill in values that are missing from a retrieveddata set 762. - Missing values can occur because in practice an entity (e.g., a financial institution) supplying the raw data may not have all of the information regarding the transaction. As an illustration, one or more data items associated with a point-of-sale transaction might be missing.
- The missing
value imputation process 760 attempts to determine the missing value(s) at 764. Current approaches typically use a most common value approach (e.g., mean, mode, median) in place of a missing value. In contrast, the system ofFIG. 15 uses aclosed form equation 770 to determine what value (e.g., optimal value) with respect to a target should be used for a missing value. The optimal value provides more information with respect to whether fraud has occurred or not. It should be noted that this approach can be utilized for many different applications other than fraud detection, such as determining credit worthiness for a loan applicant. If the system is configured with a raw data repository, the optimal values can be determined for different applications because the raw data is stored in the repository. - In a production mode, missing values can also occur and thus a closed form equation or lookup table (whose values are based upon the closed form equation) can be used to supply missing values.
- The system can use an approach wherein irrespective (e.g., independent) of the feature an equation is used to calculate the missing value. For example and as illustrated in
FIG. 16 , if the transaction amount is missing, then a closed form equation is used in themodel building phase 780 to determine the missing transaction amount value for use inbuilding model 782. In the production phase a lookup table 792 is created viaprocess 790 and used to supply the missing transaction amount value. It should be understood that any value type can be supplied, such as continuous values (e.g., a numeric transaction amount). - The missing value determination process uses the tag information that accompanies the missing value in order to determine the missing value. In a fraud detection application, the tag information would indicate whether there was fraud or no fraud associated with the input data.
- In the model
construction backend phase 780, the values that are supplied for the missing values are used to help train or test one ormore models 782 that are under construction. In the production phase, the values supplied for the missing values are used as part of the input layer to the predictive model for the application at hand. - With reference to
FIG. 17 , aclosed form equation 770 is generated viaprocess 800 for a data feature based upon historical data by using an optimality criterion involving the tag information. The correlation can be examined via a linear or nonlinear relationship.FIG. 18 provides at 850 an illustration wherein if there are six values and G is assigned a value of one and B is assigned a value of zero, then if one or more values are missing from the input data set, then the optimality criterion could be used for determining what value of “x” would maximize the correlation with respect to the tag information. - Traditional methods of creating a payment-card fraud detection system involve training a neural network model. In general, one or more distinct models are trained independently using an error function whose result is evaluated on each transaction. Occasionally, there are simple hierarchies of models where a second model can be trained on transactions deemed risky by an initial model. By treating all transactions independently, this methodology does not address the fact that fraud occurs at an account level. For instance, early transactions in a fraud episode are much more important than later transactions, since identifying fraud early-on implies preventing more substantial fraud losses. These training methods are also lacking some means of tying together the concomitant training of a number of different networks.
-
FIG. 19 depicts a training approach which addresses fraud on an account-level fashion, thereby allowing fraud-level scores to be generated independent of the existence (or lack thereof) of transactions. Stated in other words, the approach provides a holistic view of the account/customer and identifies when an account is in a compromised state, as opposed to merely detecting some fraudulent transactions. This can provide value to card issuers as account-level fraud (e.g., account takeover, identity theft) has ramped up in the recent past, as compared to traditional lost/stolen card fraud. Moreover, a family of models can be generated simultaneously that optimally balance the additional benefit due to the added complexity and the added computational/operational cost. - With reference to
FIG. 19 , training data is received atprocess 900 for training a predictive model. For the first iteration, a model can be trained with the entire data set and thus not partitioned byprocess 900. The trained model then is scored atprocess 902 and evaluated atprocess 904. If the evaluated trained model has performed satisfactorily as determined at 906, then the model as defined through the training is made available at 908. - However if through the
evaluation process 904, the model has not performed satisfactorily, then the training data set is partitioned by thegenerator process block 900. Thegenerator process block 900 determines how the training data should be split and modeled by separate and distinct predictive models.FIG. 20 illustrates at 950 a partitioning that could occur via the generator process block. - With reference to
FIG. 20 , a data set 952 (e.g., an initial data set) is partitioned into multiple data subsets 954 (e.g., data subset A and data subset B). The partitioning can be performed such that the combination of data subset A and data subset B would be the initial data set. If another iteration is required, then further partitioning can be performed, such as generating data subsets C, D and if needed E, and F (as shown at 956). It should be understood that if needed these generated subsets can be further partitioned, such as partitioning F into data subsets G, H, and I. - The training could be performed in many different ways, such as the approach depicted at 1000 in
FIG. 21 . In this approach, a mathematical model is constructed iteratively by training and combining multiple, potentially heterogeneous, learning machines. The individual “learners” are trained with emphasis on different and overlapping regions of interest. These regions, which can be constantly evolving, are determined by a partition generator. - First, an initial model is trained at 1030 using exemplars that are partitioned from the entire training space. A
data set 1010 is partitioned at 1020 in accordance withpartitioning criteria 1012. Thepartitioning criteria 1012 is based upon minimization of a ranking violations metric. An objective function based upon minimization of a ranking violations metric is manipulated so as to minimize the area under the ROC (receiver operating characteristic) curve. An ROC is a graph showing how the false rejection rate versus true rejection rate vary (e.g., it examines the percentage of good versus the percentage of bad). The area under the curve is what is to be maximized. - A learning machine that yields distinctive rankings for each exemplar, such as a neural network, a support vector machine, a classification tree, or a Bayesian network, can be used. A partition generator then uses a selection function to be applied on the training exemplars, in order to direct attention to a subset of accounts and transactions that could improve the decision performance in the operational region of interest. The selected function can be shaped so as to further emphasize the region of interest. One or more learning machines are then trained and combined to form a final model as explained below.
- At each iteration, one or multiple learning machines of various types, as well as one or multiple instances of each type of learning machine with different initial conditions are trained with emphasis on potentially different subsets of exemplars. Then the generator searches through possible transformations on the resulting machines and selects at 1040 the one that performs best for account-level fraud detection when combined with the previously selected models and the selected models are weighted at 1050.
- If the newly selected learning machine did not provide additional benefit, it is discarded and the training process is restarted. After a new learner is selected, the generator directs its attention to the selection of a different subset of exemplars. The degree of attention/emphasis to each exemplar is determined by how the exemplar assists in or hurts the detection of the account being in a fraudulent state as compared to the new learning machine being disabled (i.e., not included in the overall system). As an illustration, a weight of zero or one can be assigned to each exemplar and thus form a “hard” region. As another illustration, a continuous range of weights can be used (e.g., all real values between 0 and 1) to create a “soft” region which could avoid training instabilities. The entire system including the individual “learners” as well as their corresponding regions of interest evolves and changes with each iteration.
- With respect to the
partitioning process 1020, a ranking violations metric can take into account that a model should produce scores that result in non-fraudulent accounts or transactions being ranked lower than a fraud accounts or transaction. For exampleFIG. 22 shows at 1100 example scoring results. The model(s) that generated the scores on the left side of the table contain multiple ranking violations. The model(s) that generated the scores on the right side of the table signify an improvement in predictive capability because the non-fraudulent accounts or transactions are ranked lower than fraud accounts or transactions. -
FIGS. 23-25 provide another example for training a model. With reference toFIG. 23 , the entire set of training data is retrieved at 1200. If needed, one from a set of candidate predictive models 1212 (e.g., a neural network model, decision tree model, linear algorithm model, etc.) are selected for training. At 1220, a predictive model is trained with the training data. The trained one or more predictive models are scored and evaluated at 1230. - The evaluation can be based on a cost/error function. Some examples are the area under the ROC curve, $ savings, etc. The evaluation can be done at the account level. For example, an evaluation can examine the extent to which non-fraudulent accounts were ranked higher than fraudulent accounts and to what degree. However it should be understood that other levels could be used, such as having the ranking at the observation/transaction level, at the business problem level, etc.
- A test for convergence is performed at 1240. If there is convergence, then the model is considered defined (e.g., the parameters of the one or more predictive models are fixed) at 1242. However if convergence has not occurred, then processing continues on
FIG. 24 so that the generator process can determine at 1250 how to split one or more data sets in order to train one or more new predictive models. - The generator process determines whether the
training data set 1252 should be split and modeled by separate and distinct predictive models. This allows for an automatic determination as to whether the entire training data set should be used or whether it is better to use subsets of the data for the training. - For example the generator process can determine which data subset(s) (e.g., 1254, 1256) have been most problematic in being trained. A new predictive model will be trained to focus upon that problematic data set. One way to perform this is for the generator to assign greater importance to the problematic data and lesser importance to the data that can be adequately explained with the other predictive models that have already been trained. The weighting of importance for data subsets is done at the account level.
- A second predictive model is selected at 1260 from candidate
predictive models 1212 and trained at 1270. The second model and the first model are combined at 1280, and evaluation of the combined models'results 1292 occurs at 1290. If the combined models do converge as determined at 1300, then the combined models are provided as output at 1302. If the combined models do not converge as determined at 1300, then the data is further split as shown inFIG. 25 so that another model can be trained. - With reference to
FIG. 25 , the generator process determines at 1306 how to split one or more data sets in order to train one or more new predictive models. For example, the generator process can determine whether thetraining data set 1256 should be split and modeled by separate and distinct predictive models. More specifically, the generator process can determine which data subset(s) (e.g., 1308, 1310) have been most problematic in being trained, and a new predictive model will be trained to focus upon that problematic data set. One way to perform this is for the generator to assign greater importance to the problematic data and lesser importance to the data that can be adequately explained with the other predictive models that have already been trained. The weighting of importance for data subsets is done at the account level. - A third predictive model is selected at 1312 from candidate
predictive models 1212 and trained at 1320. The third model and the other models are combined at 1330, and evaluation of the combined models'results 1342 occurs at 1340. If the combined models do converge as determined at 1350, then the combined models are provided as output at 1360. If the combined models do not converge as determined at 1350, then the data is further split. - For performing evaluations in this training approach, the system can examine how many ranking violations have occurred. The evaluation also examines whether there is any improvement (e.g., decrease) in the number of ranking violations from the previous iteration. The convergence decision step determines whether the number of ranking violations is at an acceptable level. If it is, then the defined models are made available for any subsequent further model development or are made available for the production phase. However if the number of ranking violations is not at an acceptable level, then further partitioning occurs at the partition process block.
- It should be noted that many different types of predictive models may be utilized as candidate predictive models, such as decision trees, neural networks, linear predictive models, etc. Accordingly the resultant model may be a single predictive model and/or multiple predictive models that have been combined. Moreover combined predictive models resulting from the training sessions can be homogenous or heterogeneous. As an illustration of a homogenous combined set of predictive models, two or more neural networks can be combined from the training sessions. As an illustration of a heterogeneous combined set of predictive models, a neural network model can be combined with a decision tree model or as another illustration multiple genetic algorithm models can be combined with one or more decision tree models as well as with one or more linear regression models. During training, the evaluation block assesses the performance of the combined models. The models are placed in parallel and their outputs are combined.
- A predictive system can be configured to generate reason codes which aid in understanding the scores. Current methodology for producing reason codes for explaining the scores is not very useful, as it typically identifies variables that are similar in nature as the top three reason codes. This does not provide valuable insight into why a particular item scored high. This becomes important as the industry moves away from transaction-level scores for fraud detection to account-level scores identifying the account's compromise. Due to the more complex underlying phenomena, more refined and meaningful reason codes become tantamount.
-
FIG. 26 depicts a reasoncode determination process 1430 that is used to createreason codes 1440 for a scoring system/predictive model 1410. It should be understood that reason codes can be used for other applications other than fraud detection. The reasoncode determination process 1430 can be configured having reason code technology that is based on risk factors/groups rather than individual variables used in the models. Thereason codes 1440 for a scoring system are to provide insight to end users with respect to the score generated by the predictive model 1440 (e.g., the fraud analysis results/scores 1420). It provides guidance in reviewing the scoring entity and making appropriate actions and decisions. In case of fraud detection models, the reason codes provide directions for the initial investigations/review of suspect cases. - The
process 1430 can provide statistically-sound explanations of the results 1420 (e.g., score) from a scoring system/predictive model 1410 that is analyzingcertain input data 1400. Also, the explanations are factor based and can be used within a scoring system such as to satisfy requirements for credit scoring models under regulation B. -
FIG. 27 illustrates how reason codes can be built. Instead of using individual input variables as reasons, first reason factors are generated viaprocess 1450 by grouping variables that have similar concept. Analytic techniques are used to generate these reason factors. The reason factors could then be reviewed and refined by domain experts. Once the reason factors are formed, the importance of reason factors to the score can be constructed. The reason codes can be generated by rank ordering the “importance” of each reason factor. Finally, the performance of the reason codes is evaluated. Based on the results, one can also revise the reason generator by iterating the process. The generated reasons are provided asreason configuration data 1460 to areason determination process 1430 for use in the scoring/predicting process. -
FIG. 28 illustrates creation of reason codes. For generation of reason factors viaprocess 1500, individual variables are not used as reasons in order to avoid top reasons providing the same information. Instead, the system first groups variables that are correlated and with similar concept into different reason factors. As an illustration, the variables that relate to the time when the transaction occurred can be grouped separately from variables that do not relate to that. Many statistical techniques can be used to group the variables. Each variable group (e.g., a reason factor) can represent a reason code. - Principal Component Analysis (PCA) techniques can be used in the reason factor generation step in order to generate factors or groups that are orthogonal with respect to each other. Such technique is implemented in SAS PROC VARCLUS which is available from SAS Institute located in Cary, N.C. There are many different configurations to generate the variable grouping and they are available as options in PROC VARCLUS. For example, the number of reason factors can be controlled by specifying the number of variable clusters to be created. The reason factors generated by the PCA are then manually reviewed and refined at 1510.
- Manual review and refinement of the reason factors can be performed in many ways. For example, this could include sanity checking the variable grouping as well as creating user-friendly description for each variable factor. The groupings may be revised/refined based on domain expertise and previous results, and the configuration file is prepared for the reason code generator step.
- A reason code generator is created at 1520 by constructing and measuring the importance for each reason factor to the score. The importance can be defined in many ways. It can be (1) the strength of the reason factor to explain the current score; (2) the strength of the reason factor to make the score high; etc. The importance of these reason factors are then rank-ordered. The top most important factors are reported as the score reasons. The number of reasons will be based on the business needs.
- With reference to
FIG. 29 , the importance of each reason factor to the score could be measured by a tree model at 1540. These tree models are constructed at 1550 to extract the correlations between the values of the variables in the given reason factor and the score, and they can be built at 1560 using a SAS Enterprise Miner tree procedure/SAS PROC ARBORETUM available from SAS Institute located in Cary, N.C. By rank ordering the estimated scores for the tree models, the top corresponding reason factors are then selected as the reason codes - With reference back to
FIG. 28 , the performance of the reason code is then analyzed at 1530. The general performance of the reason code generator is reviewed. Other items used in analyzing the performance of the reason code could include: -
- frequency of the reason code
- most common reasons/combination of reasons
- frequency of reason code by score range
- manual review of cases to check the validity of the reason codes
- case reports to be generated for review
- based on the result, one may revise the reason factors grouping and renew the reason code generation step
- While examples have been used to disclose the invention, including the best mode, and also to enable any person skilled in the art to make and use the invention, the patentable scope of the invention is defined by claims, and may include other examples that occur to those skilled in the art.
- For example, the systems and methods disclosed herein may be used for many different purposes. Because the repository stores raw data (as opposed to derived or imputed information), different types of analysis other than credit card fraud detection can be performed. These additional analyses could include using the same data that is stored in the repository for credit card analysis to also be utilized for detection of merchant fraud or for some other application. Derived or imputed information from raw data does not generalize well from one application (e.g., cardholder account fraud prediction) to a different application (e.g., merchant fraud prediction).
-
FIGS. 30 and 31 depict that aview selector module 1600 can be provided that allows a user or computer program to select which entity (e.g., a particular merchant) or type of entity (e.g., all merchants) upon which they would like fraud analysis to be performed. For example a user or a computer application can shift the view from whether a cardholder account has been compromised to whether fraud has occurred at one or more merchants. The raw data that is used to predict whether fraud has occurred with respect to a cardholders account can also be utilized to predict whether fraud has occurred at a merchant location. - Other types of analysis can also be performed because the data is stored in its raw form, such as merchant attrition. Merchant attrition is when an institution loses merchants for one or more reasons. A merchant attrition score is an indicator of how likely a relationship between a merchant and an institution will be severed, such as due to bankruptcy of the merchant or the merchant porting its account to another institution (e.g., to receive more favorable terms for its account). To determine a merchant attrition score, the raw data in the
repository 272 that can be used for fraud scoring can also be used in combination with other data to calculate merchant attrition scores. The other data could include any data (raw or derived) that would indicate merchant attrition. Such data could include the fee structure charged by the institution to handle the merchant's account, how timely payments are provided by the merchant on the account, etc. Accordingly in addition to an entity view selector, a system can be configured with aselector 1610 that selects what type of analysis should be performed. -
FIG. 32 depicts the storage of raw data (e.g., 1702, 1712, 1722) at or from different institutions (1700, 1710, 1720) which provides for an expanded, more comprehensive and quicker type of analysis to occur. Because each institution is not storing its information as a set of derived or calculated set of values that are typically application-specific, the raw data from each of these repositories can be retrieved and used together in order to provide a more robust predictive capability. As shown inFIG. 32 , data is collected from the repositories (1702, 1712, 1722) at 1730. Aview selector 1740 could be used as described above for selecting a particular entity type or analysis type for processing. Predictive model(s) at 1750 generate the predictions, such as anentity score 1760. Based upon the view selection, the score can be at multiple different levels (e.g., fraud scoring at the card holder level, fraud scoring at the merchant level, merchant attrition score, bankruptcy prediction scoring, etc.) - Even if the institutions utilized different storage rules (1704, 1714, 1724) (e.g., different time periods for the same data fields), the system can still utilize this data from the different institutions (1700, 1710, 1720) because raw data is being stored. This can be helpful if the raw data repositories are in a distributed environment, such as at different sites.
- The repositories may be at different sites because of a number of reasons. A reason may be that an institution would like to have its repository at one of its locations; or different third-party analysis companies may have the repositories on behalf of their respective institutions. For example a third-party analysis company receives data from a first institution (e.g., a Visa institution) and applies its storage rules when storing information for the first institution while a different third-party analysis company receives data from a different institution (e.g., a MasterCard institution) and applies its storage rules when storing information for the other institution. Although different third-party analysis companies with their unique storage rules have stored the data from different institutions, the raw data from the different repositories can still be collected and used together in order to perform predictions.
- As an illustration, if there is a significant increase in the amount of fraud detected at a merchant's location, than the raw data in the repository associated with that merchant can be retrieved from the repository in order to determine whether fraud can be detected for other credit card accounts that have purchased goods or services at that merchant's location. (Currently it takes a prolonged period of time for detecting whether a merchant is acting fraudulently with respect to credit cards that had been used at the merchant's location.) For example the fraud rate at the merchant's location could be 0.1% but now after evaluating other credit cards from different institutions that have been utilized at the merchant's location, the fraud rate is now 10%. By analyzing through the predictive model account activities that occur at the merchant's location, a more realistic score can be generated for the merchant.
- Still further a merchant's fraud score can be used to determine whether a credit card has been compromised. Such processing can entail analyzing the raw data associated with the credit cards utilized at a merchant's location to generate a score for the merchant and then using that score to analyze an account whose credit card had recently been used at the merchant's location.
- As another example of the wide scope of the systems and methods disclosed herein,
FIGS. 33 and 34 show at 1800 a system that integrates different aspects disclosed herein. In the system ofFIGS. 33 and 34 , a predictive model is built usingdevelopment data 1802. Development data 1802 (e.g., cycle cut data, authorizations data, payment data, non-monetary data, etc.) is used to help determine at 1804 an account compromise period that is at which point in time was the account in a compromised state up until the point in time when the account was actually blocked. After the account is blocked, the customer is issued a new card. Thedevelopment data 1802 is stored in the raw data repository 1810 which has amanager 1812 that helps manage the raw data repository 1810, such as handling the updating of the raw data repository 1810 with new development data. - The raw data from the repository 1810 could also be utilized to create at 1820 static behavior tables. The data in the static behavior tables provides a global picture which is the same for a period of time (e.g., static or does not change dramatically over a period of time). These types of variables are useful in identifying the occurrence of fraud. An example of such variables include risk with respect to a geographical area. The information created for these tables do not have to be changed in production, whereas the transaction information in the repository do change once in production to reflect the transaction that is occurring while the system is in production.
- Signature records are retrieved from the repository 1810 and features from the raw data are derived at. For example, a signature is an account-level compilation of historic data of all transaction types. Signatures help a model to recognize behavior change (e.g., to detect a trend and deviation from a trend). There is one record stored for each account. Length of history of each type of data may vary. Signature data is updated with every new transaction. The features are also derived based upon the behavior tables.
- Because the retrieved data can comprise thousands of records the system analyzes the retrieved data in order to distill it down to a more manageable size by deriving features on-the-fly (in RAM) associated with the retrieved data.
- For the optimal
feature transformations process 1840, a standard prediction model transformation process as currently available can be used to reduce the amount of data that will be used as input to the predictive model. The optimal missingvalue imputation process 1850 fills in values that are missing from the retrieved data set. Missing values can occur because in practice the entity (e.g., a financial institution) supplying the raw data may not be able to provide all of the information regarding the transaction. As an illustration, one or more data items associated with a point-of-sale transaction might be missing. The missing value imputation process block determines the optimal missing value. - The automated
feature reduction process 1860 eliminates unstable features as well as other items such as features with similar content and features with minimum information content. As an illustration, this process could eliminate such unstable features that, while they may be informative, change too dramatically between data sets. Features with similar content may also be eliminated because while they may be informative when viewed in isolation are providing duplicate information (e.g., highly collinear) and thus their removal from the input data set does not significantly diminish the amount of information contained in the input data set. Contrariwise the process preserves the features that provide the most amount of information. Accordingly, this process reduces the number of variables such that the more significant variables are used for training. The generated reduced feature data set is provided as input to the model generation process. - In the model generation process, a predictive model is trained with training data which in this example is the data provided by the automated
feature reduction process 1860. In general, the predictive models are trained using error/cost measures. In this example, all accounts are scored using all networks during model building. Resulting errors are used by thegenerator process 1870 to intelligently rearrange the segments and retrain the models. In other words, thegenerator process 1870 determines whether the training data should be split and modeled by separate and distinct predictive models. This allows an automatic determination as to whether the entire training data set should be used or whether it is better to use subsets of the data for the training. - The trained one or more predictive models are scored at
process 1880. The scores are then evaluated atprocess 1890. A test for convergence is performed at 1990. If there is convergence, then the model is considered defined (e.g., the parameters of the one or more predictive models are fixed). However if convergence has not occurred, then processing returns to thegenerator process block 1870 in order to determine how to split one or more data sets in order to train one or more new predictive models. For example thegenerator process 1870 determines which data subset(s) have been most problematic in being trained. A new predictive model is trained to focus only upon that problematic data set. - The result of the training process is that a complete predictive model has been defined (e.g., the parameters are fixed). The scoring operation that is performed at 1910 after the model definition is done for the purposes of the
reason code generator 1920. Thereason code generator 1920 uses the scores generated by thescoring process 1910. The reasoncode generator process 1920 examines the account scores and is configured to provide one or more reasons for why an account received a particular score. After reason codes have been generated, anevaluation 1930 is performed again for the account scores. At this point processing could loop back toprocess 1830 to derive features from raw data and the behavior tables; or after evaluation of the reason code generation process by the evaluation process, then the development phase of the predictive model can be deemed completed. - For the production phase, the generated model and reason codes can be used to score accounts and provide reasons for those scores. As shown at 1950 and 1952, the scoring process can be triggered by receipt of a new transaction or upon demand, such as based upon a random trigger. The trigger would signal that relevant records from the raw data repository 1810 should be retrieved and processed (e.g., missing value imputation processing, etc.). The resultant data would be the input to the trained model in order to generate scores and reason codes.
- It is noted that the systems and methods may be implemented on various types of computer architectures, such as for example on a single general purpose computer or workstation, or on a networked system, or in a client-server configuration, or in an application service provider configuration.
- It is further noted that the systems and methods may include data signals conveyed via networks (e.g., local area network, wide area network, internet, etc.), fiber optic medium, carrier waves, wireless networks, etc. for communication with one or more data processing devices. The data signals can carry any or all of the data disclosed herein that is provided to or from a device.
- Additionally, the methods and systems described herein may be implemented on many different types of processing devices by program code comprising program instructions that are executable by the device processing subsystem. The software program instructions may include source code, object code, machine code, or any other stored data that is operable to cause a processing system to perform methods described herein. Other implementations may also be used, however, such as firmware or even appropriately designed hardware configured to carry out the methods and systems described herein.
- The systems' and methods' data (e.g., associations, mappings, etc.) may be stored and implemented in one or more different types of computer-implemented ways, such as different types of storage devices and programming constructs (e.g., data stores, RAM, ROM, Flash memory, flat files, databases, programming data structures, programming variables, IF-THEN (or similar type) statement constructs, etc.). It is noted that data structures describe formats for use in organizing and storing data in databases, programs, memory, or other computer-readable media for use by a computer program.
- The systems and methods may be provided on many different types of computer-readable media including computer storage mechanisms (e.g., CD-ROM, diskette, RAM, flash memory, computer's hard drive, etc.) that contain instructions for use in execution by a processor to perform the methods' operations and implement the systems described herein.
- The computer components, software modules, functions, data stores and data structures described herein may be connected directly or indirectly to each other in order to allow the flow of data needed for their operations. It is also noted that a module or processor includes but is not limited to a unit of code that performs a software operation, and can be implemented for example as a subroutine unit of code, or as a software function unit of code, or as an object (as in an object-oriented paradigm), or as an applet, or in a computer script language, or as another type of computer code. The software components and/or functionality may be located on a single computer or distributed across multiple computers depending upon the situation at hand.
- It should be understood that as used in the description herein and throughout the claims that follow, the meaning of “a,” “an,” and “the” includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise. Finally, as used in the description herein and throughout the claims that follow, the meanings of “and” and “or” include both the conjunctive and disjunctive and may be used interchangeably unless the context expressly dictates otherwise; the phrase “exclusive or” may be used to indicate situation where only the disjunctive meaning may apply.
Claims (27)
1. A computer-implemented method comprising:
presenting, on a graphical user interface, an entity type selector configured to receive a selection of one of multiple different entities for a type of financial analysis;
receiving, from the entity type selector, a selection of an a first entity;
accessing, from at least one of a plurality of raw data repositories, stored raw data associated with the selected first entity, wherein the stored raw data includes financial transaction data records, wherein the raw data repositories are configured for the storage of raw data associated with multiple different entities;
presenting, on the graphical user interface, an analysis type selector configured to receive a selection of one of multiple different types of financial analyses, wherein the multiple different types of financial analyses includes:
a fraud analysis type configured to analyze the stored raw data to perform fraud detection for a selected entity, and
a non-fraud analysis type configured to analyze the stored raw data to generate information that is not related to fraud detection for a selected entity;
receiving, from the analysis type selector, a section for the non-fraud analysis type of financial analysis; and
performing, on a processing unit, the non-fraud analysis type of financial analysis to generate information that is not related to fraud detection for the first entity.
2. The method of claim 1 , wherein the non-fraud analysis type of analysis further facilitates predicting an attrition score with respect to the first entity, and wherein the information includes an attrition score for the first entity.
3. The method of claim 2 , wherein the first entity is a merchant engaged in a customer relationship with a financial service provider, and wherein the attrition score indicates a likelihood the customer relationship will cease.
4. The method of claim 3 , wherein the financial service provider applies a fee structure to the first entity, and wherein the accessed raw data includes information that represents the fee structure, and wherein the attrition score is based on the information that represents the fee structure.
5. The method of claim 3 , wherein the accessed raw data includes a data record that indicates a history of payments made by the first entity to the financial service provider, and wherein the attrition score is based on the accessed data record.
6. The method of claim 5 , wherein the data record further indicates timeliness of the payments.
7. The method of claim 6 , further comprising:
receiving a selection of a second entity, wherein the selection of the second entity is inputted using the entity type selector, and wherein the second entity is an individual account holder who maintains an account serviced by the financial service provider;
receiving a selection of the fraud detection analysis type of analysis, wherein the selection of the fraud detection analysis type of analysis is inputted using the analysis type selector;
accessing raw data associated with transactional activity involving the account; and
detecting fraud involving the account, wherein detecting fraud includes analyzing the raw data associated with the transactional activity by applying the fraud detection analysis type of analysis.
8. The method of claim 7 , wherein the raw data associated with the transactional activity is accessed from another of the plurality of raw data depositories, and wherein the plurality of raw data depositories are maintained in a distributed environment.
9. The method of claim 1 , further comprising:
receiving a selection of the fraud detection analysis type of financial analysis, wherein the selection of the fraud detection analysis type is inputted using the analysis type selector;
in response to receiving the selection of the fraud detection analysis type, accessing additional raw data associated with the first entity, wherein the additional raw data includes data representing transactional activities involving multiple types of credit cards; and
detecting fraud involving the first entity, wherein detecting fraud includes analyzing the additional raw data using the fraud detection analysis type of financial analysis, and wherein detecting fraud includes calculating an incidence of fraud amongst transactions involving the first entity.
10. A computer computer-program product comprising a non-transitory machine-readable store medium having instructions stored therein, wherein the instructions are executable to cause a computing apparatus to perform operations including:
presenting, on a graphical user interface, an entity type selector configured to receive a selection of one of multiple different entities for a type of financial analysis;
receiving, from the entity type selector, a selection of an a first entity;
accessing, from at least one of a plurality of raw data repositories, stored raw data associated with the selected first entity, wherein the stored raw data includes financial transaction data records, wherein the raw data repositories are configured for the storage of raw data associated with multiple different entities;
presenting, on the graphical user interface, an analysis type selector configured to receive a selection of one of multiple different types of financial analyses, wherein the multiple different types of financial analyses includes:
a fraud analysis type configured to analyze the stored raw data to perform fraud detection for a selected entity, and
a non-fraud analysis type configured to analyze the stored raw data to generate information that is not related to fraud detection for a selected entity;
receiving, from the analysis type selector, a section for the non-fraud analysis type of financial analysis; and
performing, on a processing unit, the non-fraud analysis type of financial analysis to generate information that is not related to fraud detection for the first entity.
11. The computer-program product of claim 1 , wherein the non-fraud analysis type of analysis further facilitates predicting an attrition score with respect to the first entity, and wherein the information includes an attrition score for the first entity.
12. The computer-program product of claim 2 , wherein the first entity is a merchant engaged in a customer relationship with a financial service provider, and wherein the attrition score indicates a likelihood the customer relationship will cease.
13. The computer-program product of claim 12 , wherein the financial service provider applies a fee structure to the first entity, and wherein the accessed raw data includes information that represents the fee structure, and wherein the attrition score is based on the information that represents the fee structure.
14. The computer-program product of claim 12 , wherein the accessed raw data includes a data record that indicates a history of payments made by the first entity to the financial service provider, and wherein the attrition score is based on the accessed data record.
15. The computer-program product of claim 14 , wherein the data record further indicates timeliness of the payments.
16. The computer-program product of claim 15 , further comprising:
receiving a selection of a second entity, wherein the selection of the second entity is inputted using the entity type selector, and wherein the second entity is an individual account holder who maintains an account serviced by the financial service provider;
receiving a selection of the fraud detection analysis type of analysis, wherein the selection of the fraud detection analysis type of analysis is inputted using the analysis type selector;
accessing raw data associated with transactional activity involving the account; and
detecting fraud involving the account, wherein detecting fraud includes analyzing the raw data associated with the transactional activity by applying the fraud detection analysis type of analysis.
17. The computer-program product of claim 16 , wherein the raw data associated with the transactional activity is accessed from another of the plurality of raw data depositories, and wherein the plurality of raw data depositories are maintained in a distributed environment.
18. The computer-program product of claim 10 , further comprising:
receiving a selection of the fraud detection analysis type of financial analysis, wherein the selection of the fraud detection analysis type is inputted using the analysis type selector;
in response to receiving the selection of the fraud detection analysis type, accessing additional raw data associated with the first entity, wherein the additional raw data includes data representing transactional activities involving multiple types of credit cards; and
detecting fraud involving the first entity, wherein detecting fraud includes analyzing the additional raw data using the fraud detection analysis type of financial analysis, and wherein detecting fraud includes calculating an incidence of fraud amongst transactions involving the first entity.
19. A system comprising:
a processor configured to perform operations including:
presenting, on a graphical user interface, an entity type selector configured to receive a selection of one of multiple different entities for a type of financial analysis;
receiving, from the entity type selector, a selection of an a first entity;
accessing, from at least one of a plurality of raw data repositories, stored raw data associated with the selected first entity, wherein the stored raw data includes financial transaction data records, wherein the raw data repositories are configured for the storage of raw data associated with multiple different entities;
presenting, on the graphical user interface, an analysis type selector configured to receive a selection of one of multiple different types of financial analyses, wherein the multiple different types of financial analyses includes:
a fraud analysis type configured to analyze the stored raw data to perform fraud detection for a selected entity, and
a non-fraud analysis type configured to analyze the stored raw data to generate information that is not related to fraud detection for a selected entity;
receiving, from the analysis type selector, a section for the non-fraud analysis type of financial analysis; and
performing, on a processing unit, the non-fraud analysis type of financial analysis to generate information that is not related to fraud detection for the first entity.
20. The system of claim 19 , wherein the non-fraud analysis type of analysis further facilitates predicting an attrition score with respect to the first entity, and wherein the information includes an attrition score for the first entity.
21. The system of claim 20 , wherein the first entity is a merchant engaged in a customer relationship with a financial service provider, and wherein the attrition score indicates a likelihood the customer relationship will cease.
22. The system of claim 21 , wherein the financial service provider applies a fee structure to the first entity, and wherein the accessed raw data includes information that represents the fee structure, and wherein the attrition score is based on the information that represents the fee structure.
23. The system of claim 21 , wherein the accessed raw data includes a data record that indicates a history of payments made by the first entity to the financial service provider, and wherein the attrition score is based on the accessed data record.
24. The system of claim 23 , wherein the data record further indicates timeliness of the payments.
25. The system of claim 24 , further comprising:
receiving a selection of a second entity, wherein the selection of the second entity is inputted using the entity type selector, and wherein the second entity is an individual account holder who maintains an account serviced by the financial service provider;
receiving a selection of the fraud detection analysis type of analysis, wherein the selection of the fraud detection analysis type of analysis is inputted using the analysis type selector;
accessing raw data associated with transactional activity involving the account; and
detecting fraud involving the account, wherein detecting fraud includes analyzing the raw data associated with the transactional activity by applying the fraud detection analysis type of analysis.
26. The system of claim 25 , wherein the raw data associated with the transactional activity is accessed from another of the plurality of raw data depositories, and wherein the plurality of raw data depositories are maintained in a distributed environment.
27. The system of claim 19 , further comprising:
receiving a selection of the fraud detection analysis type of financial analysis, wherein the selection of the fraud detection analysis type is inputted using the analysis type selector;
in response to receiving the selection of the fraud detection analysis type, accessing additional raw data associated with the first entity, wherein the additional raw data includes data representing transactional activities involving multiple types of credit cards; and detecting fraud involving the first entity, wherein detecting fraud includes analyzing the additional raw data using the fraud detection analysis type of financial analysis, and wherein detecting fraud includes calculating an incidence of fraud amongst transactions involving the first entity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/905,524 US20130339218A1 (en) | 2006-03-24 | 2013-05-30 | Computer-Implemented Data Storage Systems and Methods for Use with Predictive Model Systems |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US78603806P | 2006-03-24 | 2006-03-24 | |
US11/691,277 US7912773B1 (en) | 2006-03-24 | 2007-03-26 | Computer-implemented data storage systems and methods for use with predictive model systems |
US12/418,186 US20090192855A1 (en) | 2006-03-24 | 2009-04-03 | Computer-Implemented Data Storage Systems And Methods For Use With Predictive Model Systems |
US13/905,524 US20130339218A1 (en) | 2006-03-24 | 2013-05-30 | Computer-Implemented Data Storage Systems and Methods for Use with Predictive Model Systems |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/418,186 Continuation US20090192855A1 (en) | 2006-03-24 | 2009-04-03 | Computer-Implemented Data Storage Systems And Methods For Use With Predictive Model Systems |
Publications (1)
Publication Number | Publication Date |
---|---|
US20130339218A1 true US20130339218A1 (en) | 2013-12-19 |
Family
ID=43741857
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/691,277 Active 2028-08-08 US7912773B1 (en) | 2006-03-24 | 2007-03-26 | Computer-implemented data storage systems and methods for use with predictive model systems |
US12/418,186 Abandoned US20090192855A1 (en) | 2006-03-24 | 2009-04-03 | Computer-Implemented Data Storage Systems And Methods For Use With Predictive Model Systems |
US12/418,174 Abandoned US20090192957A1 (en) | 2006-03-24 | 2009-04-03 | Computer-Implemented Data Storage Systems And Methods For Use With Predictive Model Systems |
US13/905,524 Abandoned US20130339218A1 (en) | 2006-03-24 | 2013-05-30 | Computer-Implemented Data Storage Systems and Methods for Use with Predictive Model Systems |
Family Applications Before (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/691,277 Active 2028-08-08 US7912773B1 (en) | 2006-03-24 | 2007-03-26 | Computer-implemented data storage systems and methods for use with predictive model systems |
US12/418,186 Abandoned US20090192855A1 (en) | 2006-03-24 | 2009-04-03 | Computer-Implemented Data Storage Systems And Methods For Use With Predictive Model Systems |
US12/418,174 Abandoned US20090192957A1 (en) | 2006-03-24 | 2009-04-03 | Computer-Implemented Data Storage Systems And Methods For Use With Predictive Model Systems |
Country Status (1)
Country | Link |
---|---|
US (4) | US7912773B1 (en) |
Cited By (45)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8788405B1 (en) * | 2013-03-15 | 2014-07-22 | Palantir Technologies, Inc. | Generating data clusters with customizable analysis strategies |
US8855999B1 (en) | 2013-03-15 | 2014-10-07 | Palantir Technologies Inc. | Method and system for generating a parser and parsing complex data |
US8930897B2 (en) | 2013-03-15 | 2015-01-06 | Palantir Technologies Inc. | Data integration tool |
US9009827B1 (en) | 2014-02-20 | 2015-04-14 | Palantir Technologies Inc. | Security sharing system |
US9021260B1 (en) | 2014-07-03 | 2015-04-28 | Palantir Technologies Inc. | Malware data item analysis |
US9043894B1 (en) | 2014-11-06 | 2015-05-26 | Palantir Technologies Inc. | Malicious software detection in a computing system |
US9202249B1 (en) | 2014-07-03 | 2015-12-01 | Palantir Technologies Inc. | Data item clustering and analysis |
US9230280B1 (en) | 2013-03-15 | 2016-01-05 | Palantir Technologies Inc. | Clustering data based on indications of financial malfeasance |
US20160012544A1 (en) * | 2014-05-28 | 2016-01-14 | Sridevi Ramaswamy | Insurance claim validation and anomaly detection based on modus operandi analysis |
US9367872B1 (en) | 2014-12-22 | 2016-06-14 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive investigation of bad actor behavior based on automatic clustering of related data in various data structures |
US9454785B1 (en) | 2015-07-30 | 2016-09-27 | Palantir Technologies Inc. | Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data |
US9535974B1 (en) | 2014-06-30 | 2017-01-03 | Palantir Technologies Inc. | Systems and methods for identifying key phrase clusters within documents |
WO2017003499A1 (en) * | 2015-06-29 | 2017-01-05 | Wepay, Inc. | System and methods for generating reason codes for ensemble computer models |
US9552615B2 (en) | 2013-12-20 | 2017-01-24 | Palantir Technologies Inc. | Automated database analysis to detect malfeasance |
US9635046B2 (en) | 2015-08-06 | 2017-04-25 | Palantir Technologies Inc. | Systems, methods, user interfaces, and computer-readable media for investigating potential malicious communications |
US9785773B2 (en) | 2014-07-03 | 2017-10-10 | Palantir Technologies Inc. | Malware data item analysis |
US9817563B1 (en) | 2014-12-29 | 2017-11-14 | Palantir Technologies Inc. | System and method of generating data points from one or more data stores of data items for chart creation and manipulation |
US9875293B2 (en) | 2014-07-03 | 2018-01-23 | Palanter Technologies Inc. | System and method for news events detection and visualization |
US9898509B2 (en) | 2015-08-28 | 2018-02-20 | Palantir Technologies Inc. | Malicious activity detection system capable of efficiently processing data accessed from databases and generating alerts for display in interactive user interfaces |
US9898528B2 (en) | 2014-12-22 | 2018-02-20 | Palantir Technologies Inc. | Concept indexing among database of documents using machine learning techniques |
US9965937B2 (en) | 2013-03-15 | 2018-05-08 | Palantir Technologies Inc. | External malware data item clustering and analysis |
US10103953B1 (en) | 2015-05-12 | 2018-10-16 | Palantir Technologies Inc. | Methods and systems for analyzing entity performance |
US10120857B2 (en) | 2013-03-15 | 2018-11-06 | Palantir Technologies Inc. | Method and system for generating a parser and parsing complex data |
US10162887B2 (en) | 2014-06-30 | 2018-12-25 | Palantir Technologies Inc. | Systems and methods for key phrase characterization of documents |
US10230746B2 (en) | 2014-01-03 | 2019-03-12 | Palantir Technologies Inc. | System and method for evaluating network threats and usage |
US10235461B2 (en) | 2017-05-02 | 2019-03-19 | Palantir Technologies Inc. | Automated assistance for generating relevant and valuable search results for an entity of interest |
US10275778B1 (en) | 2013-03-15 | 2019-04-30 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive investigation based on automatic malfeasance clustering of related data in various data structures |
US10318630B1 (en) | 2016-11-21 | 2019-06-11 | Palantir Technologies Inc. | Analysis of large bodies of textual data |
US10325224B1 (en) | 2017-03-23 | 2019-06-18 | Palantir Technologies Inc. | Systems and methods for selecting machine learning training data |
US10356032B2 (en) | 2013-12-26 | 2019-07-16 | Palantir Technologies Inc. | System and method for detecting confidential information emails |
US10362133B1 (en) | 2014-12-22 | 2019-07-23 | Palantir Technologies Inc. | Communication data processing architecture |
US10482382B2 (en) | 2017-05-09 | 2019-11-19 | Palantir Technologies Inc. | Systems and methods for reducing manufacturing failure rates |
US10489391B1 (en) | 2015-08-17 | 2019-11-26 | Palantir Technologies Inc. | Systems and methods for grouping and enriching data items accessed from one or more databases for presentation in a user interface |
US10552994B2 (en) | 2014-12-22 | 2020-02-04 | Palantir Technologies Inc. | Systems and interactive user interfaces for dynamic retrieval, analysis, and triage of data items |
US10560313B2 (en) | 2018-06-26 | 2020-02-11 | Sas Institute Inc. | Pipeline system for time-series data forecasting |
US10572487B1 (en) | 2015-10-30 | 2020-02-25 | Palantir Technologies Inc. | Periodic database search manager for multiple data sources |
US10572496B1 (en) | 2014-07-03 | 2020-02-25 | Palantir Technologies Inc. | Distributed workflow system and database with access controls for city resiliency |
US10579647B1 (en) | 2013-12-16 | 2020-03-03 | Palantir Technologies Inc. | Methods and systems for analyzing entity performance |
US10606866B1 (en) | 2017-03-30 | 2020-03-31 | Palantir Technologies Inc. | Framework for exposing network activities |
US10620618B2 (en) | 2016-12-20 | 2020-04-14 | Palantir Technologies Inc. | Systems and methods for determining relationships between defects |
US10685283B2 (en) | 2018-06-26 | 2020-06-16 | Sas Institute Inc. | Demand classification based pipeline system for time-series data forecasting |
US10719527B2 (en) | 2013-10-18 | 2020-07-21 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive simultaneous querying of multiple data stores |
US10838987B1 (en) | 2017-12-20 | 2020-11-17 | Palantir Technologies Inc. | Adaptive and transparent entity screening |
US11119630B1 (en) | 2018-06-19 | 2021-09-14 | Palantir Technologies Inc. | Artificial intelligence assisted evaluations and user interface for same |
US20210350376A1 (en) * | 2020-05-05 | 2021-11-11 | Capital One Services, Llc | Computer-based systems configured for automated activity verification based on optical character recognition models and methods of use thereof |
Families Citing this family (69)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8280805B1 (en) | 2006-01-10 | 2012-10-02 | Sas Institute Inc. | Computer-implemented risk evaluation systems and methods |
US7912773B1 (en) * | 2006-03-24 | 2011-03-22 | Sas Institute Inc. | Computer-implemented data storage systems and methods for use with predictive model systems |
US7657497B2 (en) | 2006-11-07 | 2010-02-02 | Ebay Inc. | Online fraud prevention using genetic algorithm solution |
US8015133B1 (en) * | 2007-02-20 | 2011-09-06 | Sas Institute Inc. | Computer-implemented modeling systems and methods for analyzing and predicting computer network intrusions |
US8364614B2 (en) * | 2008-01-08 | 2013-01-29 | General Electric Company | Method for building predictive models with incomplete data |
US8515862B2 (en) | 2008-05-29 | 2013-08-20 | Sas Institute Inc. | Computer-implemented systems and methods for integrated model validation for compliance and credit risk |
US10115153B2 (en) * | 2008-12-31 | 2018-10-30 | Fair Isaac Corporation | Detection of compromise of merchants, ATMS, and networks |
US20100293109A1 (en) * | 2009-05-15 | 2010-11-18 | Itg Software Solutions, Inc. | Systems, Methods and Computer Program Products For Routing Electronic Trade Orders For Execution |
US10579957B1 (en) * | 2009-07-31 | 2020-03-03 | Inmar Supply Chain Solutions, LLC | System and method for storing and displaying returned goods information |
US8620798B2 (en) * | 2009-09-11 | 2013-12-31 | Visa International Service Association | System and method using predicted consumer behavior to reduce use of transaction risk analysis and transaction denials |
US8645232B1 (en) | 2009-12-31 | 2014-02-04 | Inmar, Inc. | System and method for threshold billing for returned goods |
US8438122B1 (en) * | 2010-05-14 | 2013-05-07 | Google Inc. | Predictive analytic modeling platform |
US8473431B1 (en) | 2010-05-14 | 2013-06-25 | Google Inc. | Predictive analytic modeling platform |
US8515863B1 (en) | 2010-09-01 | 2013-08-20 | Federal Home Loan Mortgage Corporation | Systems and methods for measuring data quality over time |
US20120158566A1 (en) * | 2010-12-21 | 2012-06-21 | Corinne Fok | Transaction rate processing apparatuses, methods and systems |
US8533222B2 (en) | 2011-01-26 | 2013-09-10 | Google Inc. | Updateable predictive analytical modeling |
US8595154B2 (en) | 2011-01-26 | 2013-11-26 | Google Inc. | Dynamic predictive modeling platform |
CA2830797A1 (en) * | 2011-03-23 | 2012-09-27 | Detica Patent Limited | An automated fraud detection method and system |
US8533224B2 (en) | 2011-05-04 | 2013-09-10 | Google Inc. | Assessing accuracy of trained predictive models |
US8478688B1 (en) * | 2011-12-19 | 2013-07-02 | Emc Corporation | Rapid transaction processing |
US8595200B2 (en) * | 2012-01-03 | 2013-11-26 | Wizsoft Ltd. | Finding suspicious association rules in data records |
US20130253965A1 (en) * | 2012-03-21 | 2013-09-26 | Roshin Joseph | Time dependent transaction queue |
US9336494B1 (en) * | 2012-08-20 | 2016-05-10 | Context Relevant, Inc. | Re-training a machine learning model |
CA2895773A1 (en) * | 2012-12-22 | 2014-06-26 | Mmodal Ip Llc | User interface for predictive model generation |
US20140249934A1 (en) * | 2013-03-01 | 2014-09-04 | Sas Institute Inc. | Common point of purchase (cpp) detection |
US8966659B2 (en) * | 2013-03-14 | 2015-02-24 | Microsoft Technology Licensing, Llc | Automatic fraudulent digital certificate detection |
US9231979B2 (en) | 2013-03-14 | 2016-01-05 | Sas Institute Inc. | Rule optimization for classification and detection |
US9594907B2 (en) | 2013-03-14 | 2017-03-14 | Sas Institute Inc. | Unauthorized activity detection and classification |
US8917274B2 (en) | 2013-03-15 | 2014-12-23 | Palantir Technologies Inc. | Event matrix based on integrated data |
US8937619B2 (en) | 2013-03-15 | 2015-01-20 | Palantir Technologies Inc. | Generating an object time series from data objects |
GB2512340A (en) * | 2013-03-27 | 2014-10-01 | Riskpointer Oy | Electronic arrangement and related method for automated fraud prevention in connection with digital transactions |
US20150161611A1 (en) * | 2013-12-10 | 2015-06-11 | Sas Institute Inc. | Systems and Methods for Self-Similarity Measure |
US9508075B2 (en) * | 2013-12-13 | 2016-11-29 | Cellco Partnership | Automated transaction cancellation |
US9483162B2 (en) * | 2014-02-20 | 2016-11-01 | Palantir Technologies Inc. | Relationship visualizations |
US20150262184A1 (en) * | 2014-03-12 | 2015-09-17 | Microsoft Corporation | Two stage risk model building and evaluation |
US9857958B2 (en) | 2014-04-28 | 2018-01-02 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive access of, investigation of, and analysis of data objects stored in one or more databases |
US9009171B1 (en) | 2014-05-02 | 2015-04-14 | Palantir Technologies Inc. | Systems and methods for active column filtering |
US9785328B2 (en) | 2014-10-06 | 2017-10-10 | Palantir Technologies Inc. | Presentation of multivariate data on a graphical user interface of a computing system |
WO2016160539A1 (en) | 2015-03-27 | 2016-10-06 | Equifax, Inc. | Optimizing neural networks for risk assessment |
US9727869B1 (en) | 2015-06-05 | 2017-08-08 | Square, Inc. | Expedited point-of-sale merchant payments |
US20170053291A1 (en) * | 2015-08-17 | 2017-02-23 | International Business Machines Corporation | Optimal time scale and data volume for real-time fraud analytics |
US9424669B1 (en) | 2015-10-21 | 2016-08-23 | Palantir Technologies Inc. | Generating graphical representations of event participation flow |
US10846434B1 (en) * | 2015-11-25 | 2020-11-24 | Massachusetts Mutual Life Insurance Company | Computer-implemented fraud detection |
US9823818B1 (en) | 2015-12-29 | 2017-11-21 | Palantir Technologies Inc. | Systems and interactive user interfaces for automatic generation of temporal representation of data objects |
US10268735B1 (en) | 2015-12-29 | 2019-04-23 | Palantir Technologies Inc. | Graph based resolution of matching items in data sources |
US20210264429A1 (en) | 2016-03-25 | 2021-08-26 | State Farm Mutual Automobile Insurance Company | Reducing false positive fraud alerts for card-present financial transactions |
US12073408B2 (en) | 2016-03-25 | 2024-08-27 | State Farm Mutual Automobile Insurance Company | Detecting unauthorized online applications using machine learning |
US10769722B1 (en) * | 2016-05-12 | 2020-09-08 | State Farm Mutual Automobile Insurance Company | Heuristic credit risk assessment engine |
US10068235B1 (en) * | 2016-06-14 | 2018-09-04 | Square, Inc. | Regulating fraud probability models |
US10062078B1 (en) * | 2016-06-14 | 2018-08-28 | Square, Inc. | Fraud detection and transaction review |
US11430070B1 (en) | 2017-07-31 | 2022-08-30 | Block, Inc. | Intelligent application of reserves to transactions |
US9881066B1 (en) | 2016-08-31 | 2018-01-30 | Palantir Technologies, Inc. | Systems, methods, user interfaces and algorithms for performing database analysis and search of information involving structured and/or semi-structured data |
US10375078B2 (en) | 2016-10-10 | 2019-08-06 | Visa International Service Association | Rule management user interface |
CA3039182C (en) | 2016-11-07 | 2021-05-18 | Equifax Inc. | Optimizing automated modeling algorithms for risk assessment and generation of explanatory data |
US10552436B2 (en) | 2016-12-28 | 2020-02-04 | Palantir Technologies Inc. | Systems and methods for retrieving and processing data for display |
US10475219B1 (en) | 2017-03-30 | 2019-11-12 | Palantir Technologies Inc. | Multidimensional arc chart for visual comparison |
CN106991199B (en) * | 2017-06-07 | 2020-07-14 | 上海理工大学 | User behavior tendency probability-based recommendation system score prediction and recommendation method |
US10915900B1 (en) | 2017-06-26 | 2021-02-09 | Square, Inc. | Interchange action delay based on refund prediction |
US10929476B2 (en) | 2017-12-14 | 2021-02-23 | Palantir Technologies Inc. | Systems and methods for visualizing and analyzing multi-dimensional data |
CN110929840A (en) * | 2018-09-20 | 2020-03-27 | 维萨国际服务协会 | Continuous learning neural network system using rolling window |
US11468315B2 (en) | 2018-10-24 | 2022-10-11 | Equifax Inc. | Machine-learning techniques for monotonic neural networks |
CN109635029B (en) * | 2018-12-07 | 2023-10-13 | 深圳前海微众银行股份有限公司 | Data processing method, device, equipment and medium based on label index system |
US11182808B2 (en) * | 2019-02-05 | 2021-11-23 | Target Brands, Inc. | Method and system for attributes based forecasting |
US11636418B2 (en) * | 2019-07-23 | 2023-04-25 | PredictiveHR, Inc. | Currency reduction for predictive human resources synchronization rectification |
US11631082B2 (en) * | 2019-09-20 | 2023-04-18 | Walmart Apollo, Llc | Methods and apparatus for payment transfer fraud detection |
US20210097543A1 (en) * | 2019-09-30 | 2021-04-01 | Microsoft Technology Licensing, Llc | Determining fraud risk indicators using different fraud risk models for different data phases |
US20210295379A1 (en) * | 2020-03-17 | 2021-09-23 | Com Olho It Private Limited | System and method for detecting fraudulent advertisement traffic |
CN111461784B (en) * | 2020-03-31 | 2022-04-22 | 华南理工大学 | Multi-model fusion-based fraud detection method |
WO2024043795A1 (en) * | 2022-08-23 | 2024-02-29 | Xero Limited | Methods, systems and computer-readable media for training document type prediction models, and use thereof for creating accounting records |
Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5523942A (en) * | 1994-03-31 | 1996-06-04 | New England Mutual Life Insurance Company | Design grid for inputting insurance and investment product information in a computer system |
US5655085A (en) * | 1992-08-17 | 1997-08-05 | The Ryan Evalulife Systems, Inc. | Computer system for automated comparing of universal life insurance policies based on selectable criteria |
US20020103678A1 (en) * | 2001-02-01 | 2002-08-01 | Burkhalter Swinton B. | Multi-risk insurance system and method |
US20020156683A1 (en) * | 1999-08-09 | 2002-10-24 | First Data Corporation | Systems and methods for utilizing a point-of-sale system |
US20030187780A1 (en) * | 2002-03-27 | 2003-10-02 | First Data Corporation | Systems and methods for managing collections relating to merchant accounts |
US20030200172A1 (en) * | 2000-05-25 | 2003-10-23 | Randle William M. | Dialect independent multi-dimensional integrator using a normalized language platform and secure controlled access |
US20030212629A1 (en) * | 2002-05-07 | 2003-11-13 | King Philip Joseph Benton | Authent-eCard is an implementation of business rules and a rules engine on a portable data device, point-of-sale interface and internet portal to apply predefined rules to the automated approval of financial transactions |
US20040267647A1 (en) * | 2003-06-30 | 2004-12-30 | Brisbois Dorion P. | Capital market products including securitized life settlement bonds and methods of issuing, servicing and redeeming same |
US20050273430A1 (en) * | 2004-06-02 | 2005-12-08 | Pliha Robert K | Systems and methods for scoring bank customers direct deposit account transaction activity to match financial behavior to specific acqusition, performance and risk events defined by the bank using a decision tree and stochastic process |
US20060041455A1 (en) * | 2004-08-13 | 2006-02-23 | Dehais Robert E | Systems and methods for providing an enhanced option rider to an insurance policy |
US20060106717A1 (en) * | 2000-05-25 | 2006-05-18 | Randle William M | End to end check processing from capture to settlement with security and quality assurance |
US7086584B2 (en) * | 1999-08-09 | 2006-08-08 | First Data Corporation | Systems and methods for configuring a point-of-sale system |
US20070011224A1 (en) * | 1999-10-22 | 2007-01-11 | Jesus Mena | Real-time Internet data mining system and method for aggregating, routing, enhancing, preparing, and analyzing web databases |
US20070050217A1 (en) * | 2005-08-26 | 2007-03-01 | Holden Ellsworth J Jr | Method for forming a multi-peril insurance policy |
US20070061238A1 (en) * | 2005-09-15 | 2007-03-15 | Robert Merton | Method and apparatus for retirement income planning |
US20070244775A1 (en) * | 2006-04-18 | 2007-10-18 | Macro Val Llc | Interactive, customizable display and analysis of electronically tagged financial information |
US7353208B1 (en) * | 2000-02-02 | 2008-04-01 | Transaction Network Services, Inc. | Transaction processing using intermediate server architecture |
US20080183516A1 (en) * | 2007-01-30 | 2008-07-31 | Jeffrey Brandt | Methods and apparatus to determine when to deflect callers to websites |
US7698158B1 (en) * | 2000-10-24 | 2010-04-13 | Theinsuranceadvisor Technologies, Inc. | Life insurance policy evaluation method |
US7818228B1 (en) * | 2004-12-16 | 2010-10-19 | Coulter David B | System and method for managing consumer information |
US8073785B1 (en) * | 1999-11-09 | 2011-12-06 | Candella George J | Method and system for detecting fraud in non-personal transactions |
Family Cites Families (125)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US99635A (en) * | 1870-02-08 | Improvement in fountain-pens | ||
US192167A (en) * | 1877-06-19 | Improvement in methods of ornamenting metallic surfaces for jewelry | ||
JP3231810B2 (en) * | 1990-08-28 | 2001-11-26 | アーチ・デベロップメント・コーポレーション | Differential diagnosis support method using neural network |
US5335291A (en) | 1991-09-20 | 1994-08-02 | Massachusetts Institute Of Technology | Method and apparatus for pattern mapping system with self-reliability check |
US5650722A (en) | 1991-11-20 | 1997-07-22 | Auburn International, Inc. | Using resin age factor to obtain measurements of improved accuracy of one or more polymer properties with an on-line NMR system |
US5519319A (en) | 1991-11-20 | 1996-05-21 | Auburn International, Inc. | Obtaining measurements of improved accuracy of one or more polymer properties with an on-line NMR system |
US5675253A (en) | 1991-11-20 | 1997-10-07 | Auburn International, Inc. | Partial least square regression techniques in obtaining measurements of one or more polymer properties with an on-line nmr system |
US5819226A (en) * | 1992-09-08 | 1998-10-06 | Hnc Software Inc. | Fraud detection using predictive modeling |
US5638492A (en) * | 1992-09-08 | 1997-06-10 | Hitachi, Ltd. | Information processing apparatus and monitoring apparatus |
US5345595A (en) * | 1992-11-12 | 1994-09-06 | Coral Systems, Inc. | Apparatus and method for detecting fraudulent telecommunication activity |
US5448684A (en) * | 1993-11-12 | 1995-09-05 | Motorola, Inc. | Neural network, neuron, and method for recognizing a missing input valve |
US5748780A (en) * | 1994-04-07 | 1998-05-05 | Stolfo; Salvatore J. | Method and apparatus for imaging, image processing and data compression |
US5500513A (en) * | 1994-05-11 | 1996-03-19 | Visa International | Automated purchasing control system |
US5832068A (en) * | 1994-06-01 | 1998-11-03 | Davox Corporation | Data processing system with real time priority updating of data records and dynamic record exclusion |
US5761442A (en) | 1994-08-31 | 1998-06-02 | Advanced Investment Technology, Inc. | Predictive neural network means and method for selecting a portfolio of securities wherein each network has been trained using data relating to a corresponding security |
US5727161A (en) * | 1994-09-16 | 1998-03-10 | Planscan, Llc | Method and apparatus for graphic analysis of variation of economic plans |
US5627886A (en) * | 1994-09-22 | 1997-05-06 | Electronic Data Systems Corporation | System and method for detecting fraudulent network usage patterns using real-time network monitoring |
US5835902A (en) * | 1994-11-02 | 1998-11-10 | Jannarone; Robert J. | Concurrent learning and performance information processing system |
US7155401B1 (en) * | 1994-12-23 | 2006-12-26 | International Business Machines Corporation | Automatic sales promotion selection system and method |
US5677955A (en) | 1995-04-07 | 1997-10-14 | Financial Services Technology Consortium | Electronic funds transfer instruments |
US6601048B1 (en) | 1997-09-12 | 2003-07-29 | Mci Communications Corporation | System and method for detecting and managing fraud |
US5884289A (en) * | 1995-06-16 | 1999-03-16 | Card Alert Services, Inc. | Debit card fraud detection and control system |
DE19530647C1 (en) * | 1995-08-21 | 1997-01-23 | Siemens Ag | Input parameter preparation for neural network |
US6601049B1 (en) | 1996-05-02 | 2003-07-29 | David L. Cooper | Self-adjusting multi-layer neural network architectures and methods therefor |
US5878337A (en) * | 1996-08-08 | 1999-03-02 | Joao; Raymond Anthony | Transaction security apparatus and method |
US6021943A (en) * | 1996-10-09 | 2000-02-08 | Chastain; Robert H. | Process for executing payment transactions |
GB9624298D0 (en) | 1996-11-22 | 1997-01-08 | Univ Strathclyde | Improved neural network |
US6029154A (en) | 1997-07-28 | 2000-02-22 | Internet Commerce Services Corporation | Method and system for detecting fraud in a credit card transaction over the internet |
US7403922B1 (en) | 1997-07-28 | 2008-07-22 | Cybersource Corporation | Method and apparatus for evaluating fraud risk in an electronic commerce transaction |
US5940812A (en) * | 1997-08-19 | 1999-08-17 | Loanmarket Resources, L.L.C. | Apparatus and method for automatically matching a best available loan to a potential borrower via global telecommunications network |
US6112190A (en) * | 1997-08-19 | 2000-08-29 | Citibank, N.A. | Method and system for commercial credit analysis |
US6125349A (en) | 1997-10-01 | 2000-09-26 | At&T Corp. | Method and apparatus using digital credentials and other electronic certificates for electronic transactions |
US6128602A (en) * | 1997-10-27 | 2000-10-03 | Bank Of America Corporation | Open-architecture system for real-time consolidation of information from multiple financial systems |
US6047268A (en) | 1997-11-04 | 2000-04-04 | A.T.&T. Corporation | Method and apparatus for billing for transactions conducted over the internet |
US6016480A (en) * | 1997-11-07 | 2000-01-18 | Image Data, Llc | Merchandise return fraud prevention system and method |
US6202053B1 (en) * | 1998-01-23 | 2001-03-13 | First Usa Bank, Na | Method and apparatus for generating segmentation scorecards for evaluating credit risk of bank card applicants |
US5999596A (en) * | 1998-03-06 | 1999-12-07 | Walker Asset Management Limited | Method and system for controlling authorization of credit card transactions |
WO1999048036A1 (en) * | 1998-03-20 | 1999-09-23 | Iq Financial Systems, Inc. | System, method, and computer program product for assessing risk within a predefined market |
US6422462B1 (en) * | 1998-03-30 | 2002-07-23 | Morris E. Cohen | Apparatus and methods for improved credit cards and credit card transactions |
US6064990A (en) | 1998-03-31 | 2000-05-16 | International Business Machines Corporation | System for electronic notification of account activity |
US6047287A (en) * | 1998-05-05 | 2000-04-04 | Justsystem Pittsburgh Research Center | Iterated K-nearest neighbor method and article of manufacture for filling in missing values |
US6122624A (en) | 1998-05-28 | 2000-09-19 | Automated Transaction Corp. | System and method for enhanced fraud detection in automated electronic purchases |
US6678640B2 (en) | 1998-06-10 | 2004-01-13 | Matsushita Electric Industrial Co., Ltd. | Method and apparatus for parameter estimation, parameter estimation control and learning control |
US6360326B1 (en) * | 1998-09-09 | 2002-03-19 | Compaq Information Technologies Group, L.P. | Password delay |
US6170744B1 (en) | 1998-09-24 | 2001-01-09 | Payformance Corporation | Self-authenticating negotiable documents |
US6401198B1 (en) * | 1999-03-09 | 2002-06-04 | Texas Instruments Incorporated | Storing system-level mass storage configuration data in non-volatile memory on each mass storage device to allow for reboot/power-on reconfiguration of all installed mass storage devices to the same configuration as last use |
US6650779B2 (en) | 1999-03-26 | 2003-11-18 | Georgia Tech Research Corp. | Method and apparatus for analyzing an image to detect and identify patterns |
US6631212B1 (en) | 1999-09-13 | 2003-10-07 | Eastman Kodak Company | Twostage scheme for texture segmentation based on clustering using a first set of features and refinement using a second set of features |
JP3485508B2 (en) | 1999-10-26 | 2004-01-13 | 株式会社国際電気通信基礎技術研究所 | Facial image transmitting method and system, and facial image transmitting device and facial image reproducing device used in the system |
US7177836B1 (en) * | 1999-12-30 | 2007-02-13 | First Data Corporation | Method and system for facilitating financial transactions between consumers over the internet |
US6516056B1 (en) | 2000-01-07 | 2003-02-04 | Vesta Corporation | Fraud prevention system and method |
US7191150B1 (en) | 2000-02-01 | 2007-03-13 | Fair Isaac Corporation | Enhancing delinquent debt collection using statistical models of debt historical information and account events |
US7310618B2 (en) * | 2000-02-22 | 2007-12-18 | Lehman Brothers Inc. | Automated loan evaluation system |
JP3348067B2 (en) | 2000-02-29 | 2002-11-20 | 株式会社電通 | Method and apparatus for controlling advertisement playback |
CN1439139A (en) | 2000-03-24 | 2003-08-27 | 通达商业集团国际公司 | System and method for detecting fraudulent transactions |
US20010056379A1 (en) * | 2000-04-10 | 2001-12-27 | Kazuya Fujinaga | Electronic commerce broking system |
US6613519B1 (en) * | 2000-04-20 | 2003-09-02 | Rappaport Family Institute For Reseach In The Medical Sciences | Method of determining a risk of hyperglycemic patients of developing a cardiovascular disease |
US6251608B1 (en) * | 2000-04-20 | 2001-06-26 | Technion Research & Development Foundation, Ltd. | Method of determining a potential of a hyperglycemic patients of developing vascular complications |
US6599702B1 (en) * | 2000-04-20 | 2003-07-29 | Rappaport Family Institute For Research In The Medical Sciences | Method of evaluating a risk of a subject of developing vascular complications |
US7640489B2 (en) * | 2000-08-01 | 2009-12-29 | Sun Microsystems, Inc. | Methods and systems for inputting data into spreadsheet documents |
US6549861B1 (en) | 2000-08-10 | 2003-04-15 | Euro-Celtique, S.A. | Automated system and method for spectroscopic analysis |
US20070192863A1 (en) * | 2005-07-01 | 2007-08-16 | Harsh Kapoor | Systems and methods for processing data flows |
US7392216B1 (en) * | 2000-09-27 | 2008-06-24 | Ge Capital Mortgage Corporation | Methods and apparatus for utilizing a proportional hazards model to evaluate loan risk |
AU2002211405A1 (en) * | 2000-10-02 | 2002-04-15 | International Projects Consultancy Services, Inc. | Object-based workflow system and method |
KR20010008034A (en) * | 2000-11-03 | 2001-02-05 | 이은우 | Direct Appraisal Analysis System Based On Internet Using Web-Site |
US6388592B1 (en) * | 2001-01-18 | 2002-05-14 | International Business Machines Corporation | Using simulated pseudo data to speed up statistical predictive modeling from massive data sets |
US20020099635A1 (en) * | 2001-01-24 | 2002-07-25 | Jack Guiragosian | Control of account utilization |
US6901398B1 (en) * | 2001-02-12 | 2005-05-31 | Microsoft Corporation | System and method for constructing and personalizing a universal information classifier |
US20020138417A1 (en) | 2001-03-20 | 2002-09-26 | David Lawrence | Risk management clearinghouse |
US20050060207A1 (en) * | 2001-05-08 | 2005-03-17 | Weidner James L. | Claims paid insurance |
US7269516B2 (en) | 2001-05-15 | 2007-09-11 | Psychogenics, Inc. | Systems and methods for monitoring behavior informatics |
US7865427B2 (en) | 2001-05-30 | 2011-01-04 | Cybersource Corporation | Method and apparatus for evaluating fraud risk in an electronic commerce transaction |
US8332291B2 (en) | 2001-10-05 | 2012-12-11 | Argus Information and Advisory Services, Inc. | System and method for monitoring managing and valuing credit accounts |
US8458082B2 (en) | 2001-11-13 | 2013-06-04 | Interthinx, Inc. | Automated loan risk assessment system and method |
US20030191709A1 (en) * | 2002-04-03 | 2003-10-09 | Stephen Elston | Distributed payment and loyalty processing for retail and vending |
US6845336B2 (en) * | 2002-06-25 | 2005-01-18 | Prasad S. Kodukula | Water treatment monitoring system |
EP1388812A1 (en) * | 2002-07-04 | 2004-02-11 | Ronald E. Dr. Kates | Method for training a learning-capable system |
US20040186815A1 (en) * | 2002-12-20 | 2004-09-23 | Stockfisch Thomas P. | Method for accommodating missing descriptor and property data while training neural network models |
WO2004061564A2 (en) * | 2002-12-30 | 2004-07-22 | Fannie Mae | System and method for pricing loans in the secondary mortgage market |
US20040267660A1 (en) * | 2003-02-21 | 2004-12-30 | Automated Financial Systems, Inc. | Risk management system |
US7971237B2 (en) * | 2003-05-15 | 2011-06-28 | Verizon Business Global Llc | Method and system for providing fraud detection for remote access services |
JP2004348536A (en) * | 2003-05-23 | 2004-12-09 | Intelligent Wave Inc | History information addition program, fraudulent determination program using history information, and fraudulent determination system using history information |
US7461048B2 (en) | 2003-07-21 | 2008-12-02 | Aureon Laboratories, Inc. | Systems and methods for treating, diagnosing and predicting the occurrence of a medical condition |
US7467119B2 (en) | 2003-07-21 | 2008-12-16 | Aureon Laboratories, Inc. | Systems and methods for treating, diagnosing and predicting the occurrence of a medical condition |
US20050055373A1 (en) | 2003-09-04 | 2005-03-10 | Forman George H. | Determining point-of-compromise |
EP1664687A4 (en) * | 2003-09-12 | 2009-01-14 | Rsa Security Inc | System and method for risk based authentication |
US7676408B2 (en) * | 2003-09-12 | 2010-03-09 | Moebs Services, Inc. | Risk identification system and methods |
US20050065871A1 (en) * | 2003-09-23 | 2005-03-24 | Nucenz Technologies, Inc. | Collateralized loan market systems and methods |
US20050076230A1 (en) * | 2003-10-02 | 2005-04-07 | George Redenbaugh | Fraud tracking cookie |
US8301584B2 (en) * | 2003-12-16 | 2012-10-30 | International Business Machines Corporation | System and method for adaptive pruning |
US7480640B1 (en) | 2003-12-16 | 2009-01-20 | Quantum Leap Research, Inc. | Automated method and system for generating models from data |
US8065227B1 (en) * | 2003-12-31 | 2011-11-22 | Bank Of America Corporation | Method and system for producing custom behavior scores for use in credit decisioning |
US7327258B2 (en) | 2004-02-04 | 2008-02-05 | Guardian Mobile Monitoring Systems | System for, and method of, monitoring the movements of mobile items |
US7853533B2 (en) * | 2004-03-02 | 2010-12-14 | The 41St Parameter, Inc. | Method and system for identifying users and detecting fraud by use of the internet |
US20050222928A1 (en) * | 2004-04-06 | 2005-10-06 | Pricewaterhousecoopers Llp | Systems and methods for investigation of financial reporting information |
US7562058B2 (en) | 2004-04-16 | 2009-07-14 | Fortelligent, Inc. | Predictive model management using a re-entrant process |
US7490356B2 (en) | 2004-07-20 | 2009-02-10 | Reflectent Software, Inc. | End user risk management |
US20060106700A1 (en) * | 2004-11-12 | 2006-05-18 | Boren Michael K | Investment analysis and reporting system and method |
AU2005325726B2 (en) * | 2005-01-25 | 2011-10-27 | I4 Commerce Inc. | Computer-implemented method and system for dynamic consumer rating in a transaction |
US20060195391A1 (en) * | 2005-02-28 | 2006-08-31 | Stanelle Evan J | Modeling loss in a term structured financial portfolio |
US20060212386A1 (en) * | 2005-03-15 | 2006-09-21 | Willey Dawn M | Credit scoring method and system |
US7328218B2 (en) * | 2005-03-22 | 2008-02-05 | Salford Systems | Constrained tree structure method and system |
US7455226B1 (en) | 2005-04-18 | 2008-11-25 | The Return Exchange, Inc. | Systems and methods for data collection at a point of return |
US8271364B2 (en) * | 2005-06-09 | 2012-09-18 | Bank Of America Corporation | Method and apparatus for obtaining, organizing, and analyzing multi-source data |
US7761379B2 (en) * | 2005-06-24 | 2010-07-20 | Fair Isaac Corporation | Mass compromise/point of compromise analytic detection and compromised card portfolio management system |
US7925973B2 (en) * | 2005-08-12 | 2011-04-12 | Brightcove, Inc. | Distribution of content |
US8065214B2 (en) * | 2005-09-06 | 2011-11-22 | Ge Corporate Financial Services, Inc. | Methods and system for assessing loss severity for commercial loans |
WO2007041709A1 (en) * | 2005-10-04 | 2007-04-12 | Basepoint Analytics Llc | System and method of detecting fraud |
US20070192167A1 (en) | 2005-10-24 | 2007-08-16 | Ying Lei | Methods and systems for managing transaction card customer accounts |
US7610257B1 (en) * | 2006-01-10 | 2009-10-27 | Sas Institute Inc. | Computer-implemented risk evaluation systems and methods |
US8280805B1 (en) * | 2006-01-10 | 2012-10-02 | Sas Institute Inc. | Computer-implemented risk evaluation systems and methods |
US20070198401A1 (en) * | 2006-01-18 | 2007-08-23 | Reto Kunz | System and method for automatic evaluation of credit requests |
EP1816595A1 (en) * | 2006-02-06 | 2007-08-08 | MediaKey Ltd. | A method and a system for identifying potentially fraudulent customers in relation to network based commerce activities, in particular involving payment, and a computer program for performing said method |
US7797217B2 (en) * | 2006-03-15 | 2010-09-14 | Entaire Global Intellectual Property, Inc. | System for managing the total risk exposure for a portfolio of loans |
US20070219817A1 (en) * | 2006-03-16 | 2007-09-20 | Jianqing Wu | Universal Negotiation Forum |
US7912773B1 (en) * | 2006-03-24 | 2011-03-22 | Sas Institute Inc. | Computer-implemented data storage systems and methods for use with predictive model systems |
US7587348B2 (en) * | 2006-03-24 | 2009-09-08 | Basepoint Analytics Llc | System and method of detecting mortgage related fraud |
US20080114783A1 (en) * | 2006-11-15 | 2008-05-15 | Nguyen Tien M | Method, system, and program product for managing a process and it interlock |
US20080243569A1 (en) * | 2007-04-02 | 2008-10-02 | Michael Shane Hadden | Automated loan system and method |
WO2008151259A2 (en) * | 2007-06-04 | 2008-12-11 | Risk Allocation Systems | System and method for sharing and allocating financial risk associated with a loan |
US20090018955A1 (en) * | 2007-07-13 | 2009-01-15 | Yen-Fu Chen | Method and apparatus for providing user access to payment methods |
US7962404B1 (en) * | 2007-11-07 | 2011-06-14 | Experian Information Solutions, Inc. | Systems and methods for determining loan opportunities |
US8122510B2 (en) * | 2007-11-14 | 2012-02-21 | Bank Of America Corporation | Method for analyzing and managing unstructured data |
US8463698B2 (en) * | 2007-12-27 | 2013-06-11 | Mastercard International Incorporated | Systems and methods to select a credit migration path for a consumer |
US8515862B2 (en) * | 2008-05-29 | 2013-08-20 | Sas Institute Inc. | Computer-implemented systems and methods for integrated model validation for compliance and credit risk |
-
2007
- 2007-03-26 US US11/691,277 patent/US7912773B1/en active Active
-
2009
- 2009-04-03 US US12/418,186 patent/US20090192855A1/en not_active Abandoned
- 2009-04-03 US US12/418,174 patent/US20090192957A1/en not_active Abandoned
-
2013
- 2013-05-30 US US13/905,524 patent/US20130339218A1/en not_active Abandoned
Patent Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5655085A (en) * | 1992-08-17 | 1997-08-05 | The Ryan Evalulife Systems, Inc. | Computer system for automated comparing of universal life insurance policies based on selectable criteria |
US5523942A (en) * | 1994-03-31 | 1996-06-04 | New England Mutual Life Insurance Company | Design grid for inputting insurance and investment product information in a computer system |
US20020156683A1 (en) * | 1999-08-09 | 2002-10-24 | First Data Corporation | Systems and methods for utilizing a point-of-sale system |
US7086584B2 (en) * | 1999-08-09 | 2006-08-08 | First Data Corporation | Systems and methods for configuring a point-of-sale system |
US20070011224A1 (en) * | 1999-10-22 | 2007-01-11 | Jesus Mena | Real-time Internet data mining system and method for aggregating, routing, enhancing, preparing, and analyzing web databases |
US8073785B1 (en) * | 1999-11-09 | 2011-12-06 | Candella George J | Method and system for detecting fraud in non-personal transactions |
US7353208B1 (en) * | 2000-02-02 | 2008-04-01 | Transaction Network Services, Inc. | Transaction processing using intermediate server architecture |
US20060106717A1 (en) * | 2000-05-25 | 2006-05-18 | Randle William M | End to end check processing from capture to settlement with security and quality assurance |
US20030200172A1 (en) * | 2000-05-25 | 2003-10-23 | Randle William M. | Dialect independent multi-dimensional integrator using a normalized language platform and secure controlled access |
US7698158B1 (en) * | 2000-10-24 | 2010-04-13 | Theinsuranceadvisor Technologies, Inc. | Life insurance policy evaluation method |
US20020103678A1 (en) * | 2001-02-01 | 2002-08-01 | Burkhalter Swinton B. | Multi-risk insurance system and method |
US20030187780A1 (en) * | 2002-03-27 | 2003-10-02 | First Data Corporation | Systems and methods for managing collections relating to merchant accounts |
US20030212629A1 (en) * | 2002-05-07 | 2003-11-13 | King Philip Joseph Benton | Authent-eCard is an implementation of business rules and a rules engine on a portable data device, point-of-sale interface and internet portal to apply predefined rules to the automated approval of financial transactions |
US20040267647A1 (en) * | 2003-06-30 | 2004-12-30 | Brisbois Dorion P. | Capital market products including securitized life settlement bonds and methods of issuing, servicing and redeeming same |
US20050273430A1 (en) * | 2004-06-02 | 2005-12-08 | Pliha Robert K | Systems and methods for scoring bank customers direct deposit account transaction activity to match financial behavior to specific acqusition, performance and risk events defined by the bank using a decision tree and stochastic process |
US20060041455A1 (en) * | 2004-08-13 | 2006-02-23 | Dehais Robert E | Systems and methods for providing an enhanced option rider to an insurance policy |
US7818228B1 (en) * | 2004-12-16 | 2010-10-19 | Coulter David B | System and method for managing consumer information |
US20070050217A1 (en) * | 2005-08-26 | 2007-03-01 | Holden Ellsworth J Jr | Method for forming a multi-peril insurance policy |
US20070061238A1 (en) * | 2005-09-15 | 2007-03-15 | Robert Merton | Method and apparatus for retirement income planning |
US20070244775A1 (en) * | 2006-04-18 | 2007-10-18 | Macro Val Llc | Interactive, customizable display and analysis of electronically tagged financial information |
US20080183516A1 (en) * | 2007-01-30 | 2008-07-31 | Jeffrey Brandt | Methods and apparatus to determine when to deflect callers to websites |
Cited By (87)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9965937B2 (en) | 2013-03-15 | 2018-05-08 | Palantir Technologies Inc. | External malware data item clustering and analysis |
US8788407B1 (en) | 2013-03-15 | 2014-07-22 | Palantir Technologies Inc. | Malware data clustering |
US8818892B1 (en) * | 2013-03-15 | 2014-08-26 | Palantir Technologies, Inc. | Prioritizing data clusters with customizable scoring strategies |
US8855999B1 (en) | 2013-03-15 | 2014-10-07 | Palantir Technologies Inc. | Method and system for generating a parser and parsing complex data |
US8930897B2 (en) | 2013-03-15 | 2015-01-06 | Palantir Technologies Inc. | Data integration tool |
US10834123B2 (en) | 2013-03-15 | 2020-11-10 | Palantir Technologies Inc. | Generating data clusters |
US8788405B1 (en) * | 2013-03-15 | 2014-07-22 | Palantir Technologies, Inc. | Generating data clusters with customizable analysis strategies |
US10721268B2 (en) | 2013-03-15 | 2020-07-21 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive investigation based on automatic clustering of related data in various data structures |
US9135658B2 (en) | 2013-03-15 | 2015-09-15 | Palantir Technologies Inc. | Generating data clusters |
US9165299B1 (en) | 2013-03-15 | 2015-10-20 | Palantir Technologies Inc. | User-agent data clustering |
US9171334B1 (en) | 2013-03-15 | 2015-10-27 | Palantir Technologies Inc. | Tax data clustering |
US9177344B1 (en) | 2013-03-15 | 2015-11-03 | Palantir Technologies Inc. | Trend data clustering |
US10275778B1 (en) | 2013-03-15 | 2019-04-30 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive investigation based on automatic malfeasance clustering of related data in various data structures |
US9230280B1 (en) | 2013-03-15 | 2016-01-05 | Palantir Technologies Inc. | Clustering data based on indications of financial malfeasance |
US10264014B2 (en) | 2013-03-15 | 2019-04-16 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive investigation based on automatic clustering of related data in various data structures |
US10216801B2 (en) | 2013-03-15 | 2019-02-26 | Palantir Technologies Inc. | Generating data clusters |
US10120857B2 (en) | 2013-03-15 | 2018-11-06 | Palantir Technologies Inc. | Method and system for generating a parser and parsing complex data |
US10719527B2 (en) | 2013-10-18 | 2020-07-21 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive simultaneous querying of multiple data stores |
US10579647B1 (en) | 2013-12-16 | 2020-03-03 | Palantir Technologies Inc. | Methods and systems for analyzing entity performance |
US9552615B2 (en) | 2013-12-20 | 2017-01-24 | Palantir Technologies Inc. | Automated database analysis to detect malfeasance |
US10356032B2 (en) | 2013-12-26 | 2019-07-16 | Palantir Technologies Inc. | System and method for detecting confidential information emails |
US10230746B2 (en) | 2014-01-03 | 2019-03-12 | Palantir Technologies Inc. | System and method for evaluating network threats and usage |
US10805321B2 (en) | 2014-01-03 | 2020-10-13 | Palantir Technologies Inc. | System and method for evaluating network threats and usage |
US9923925B2 (en) | 2014-02-20 | 2018-03-20 | Palantir Technologies Inc. | Cyber security sharing and identification system |
US10873603B2 (en) | 2014-02-20 | 2020-12-22 | Palantir Technologies Inc. | Cyber security sharing and identification system |
US9009827B1 (en) | 2014-02-20 | 2015-04-14 | Palantir Technologies Inc. | Security sharing system |
US20160012544A1 (en) * | 2014-05-28 | 2016-01-14 | Sridevi Ramaswamy | Insurance claim validation and anomaly detection based on modus operandi analysis |
US10180929B1 (en) | 2014-06-30 | 2019-01-15 | Palantir Technologies, Inc. | Systems and methods for identifying key phrase clusters within documents |
US9535974B1 (en) | 2014-06-30 | 2017-01-03 | Palantir Technologies Inc. | Systems and methods for identifying key phrase clusters within documents |
US10162887B2 (en) | 2014-06-30 | 2018-12-25 | Palantir Technologies Inc. | Systems and methods for key phrase characterization of documents |
US11341178B2 (en) | 2014-06-30 | 2022-05-24 | Palantir Technologies Inc. | Systems and methods for key phrase characterization of documents |
US10798116B2 (en) | 2014-07-03 | 2020-10-06 | Palantir Technologies Inc. | External malware data item clustering and analysis |
US9998485B2 (en) | 2014-07-03 | 2018-06-12 | Palantir Technologies, Inc. | Network intrusion data item clustering and analysis |
US10572496B1 (en) | 2014-07-03 | 2020-02-25 | Palantir Technologies Inc. | Distributed workflow system and database with access controls for city resiliency |
US9021260B1 (en) | 2014-07-03 | 2015-04-28 | Palantir Technologies Inc. | Malware data item analysis |
US9881074B2 (en) | 2014-07-03 | 2018-01-30 | Palantir Technologies Inc. | System and method for news events detection and visualization |
US9875293B2 (en) | 2014-07-03 | 2018-01-23 | Palanter Technologies Inc. | System and method for news events detection and visualization |
US9785773B2 (en) | 2014-07-03 | 2017-10-10 | Palantir Technologies Inc. | Malware data item analysis |
US9344447B2 (en) | 2014-07-03 | 2016-05-17 | Palantir Technologies Inc. | Internal malware data item clustering and analysis |
US9202249B1 (en) | 2014-07-03 | 2015-12-01 | Palantir Technologies Inc. | Data item clustering and analysis |
US10929436B2 (en) | 2014-07-03 | 2021-02-23 | Palantir Technologies Inc. | System and method for news events detection and visualization |
US10135863B2 (en) | 2014-11-06 | 2018-11-20 | Palantir Technologies Inc. | Malicious software detection in a computing system |
US10728277B2 (en) | 2014-11-06 | 2020-07-28 | Palantir Technologies Inc. | Malicious software detection in a computing system |
US9043894B1 (en) | 2014-11-06 | 2015-05-26 | Palantir Technologies Inc. | Malicious software detection in a computing system |
US9558352B1 (en) | 2014-11-06 | 2017-01-31 | Palantir Technologies Inc. | Malicious software detection in a computing system |
US9589299B2 (en) | 2014-12-22 | 2017-03-07 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive investigation of bad actor behavior based on automatic clustering of related data in various data structures |
US9898528B2 (en) | 2014-12-22 | 2018-02-20 | Palantir Technologies Inc. | Concept indexing among database of documents using machine learning techniques |
US11252248B2 (en) | 2014-12-22 | 2022-02-15 | Palantir Technologies Inc. | Communication data processing architecture |
US9367872B1 (en) | 2014-12-22 | 2016-06-14 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive investigation of bad actor behavior based on automatic clustering of related data in various data structures |
US10362133B1 (en) | 2014-12-22 | 2019-07-23 | Palantir Technologies Inc. | Communication data processing architecture |
US10552994B2 (en) | 2014-12-22 | 2020-02-04 | Palantir Technologies Inc. | Systems and interactive user interfaces for dynamic retrieval, analysis, and triage of data items |
US10447712B2 (en) | 2014-12-22 | 2019-10-15 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive investigation of bad actor behavior based on automatic clustering of related data in various data structures |
US9817563B1 (en) | 2014-12-29 | 2017-11-14 | Palantir Technologies Inc. | System and method of generating data points from one or more data stores of data items for chart creation and manipulation |
US10552998B2 (en) | 2014-12-29 | 2020-02-04 | Palantir Technologies Inc. | System and method of generating data points from one or more data stores of data items for chart creation and manipulation |
US10103953B1 (en) | 2015-05-12 | 2018-10-16 | Palantir Technologies Inc. | Methods and systems for analyzing entity performance |
US10387800B2 (en) | 2015-06-29 | 2019-08-20 | Wepay, Inc. | System and methods for generating reason codes for ensemble computer models |
WO2017003499A1 (en) * | 2015-06-29 | 2017-01-05 | Wepay, Inc. | System and methods for generating reason codes for ensemble computer models |
US9454785B1 (en) | 2015-07-30 | 2016-09-27 | Palantir Technologies Inc. | Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data |
US11501369B2 (en) | 2015-07-30 | 2022-11-15 | Palantir Technologies Inc. | Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data |
US10223748B2 (en) | 2015-07-30 | 2019-03-05 | Palantir Technologies Inc. | Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data |
US10484407B2 (en) | 2015-08-06 | 2019-11-19 | Palantir Technologies Inc. | Systems, methods, user interfaces, and computer-readable media for investigating potential malicious communications |
US9635046B2 (en) | 2015-08-06 | 2017-04-25 | Palantir Technologies Inc. | Systems, methods, user interfaces, and computer-readable media for investigating potential malicious communications |
US10489391B1 (en) | 2015-08-17 | 2019-11-26 | Palantir Technologies Inc. | Systems and methods for grouping and enriching data items accessed from one or more databases for presentation in a user interface |
US10346410B2 (en) | 2015-08-28 | 2019-07-09 | Palantir Technologies Inc. | Malicious activity detection system capable of efficiently processing data accessed from databases and generating alerts for display in interactive user interfaces |
US9898509B2 (en) | 2015-08-28 | 2018-02-20 | Palantir Technologies Inc. | Malicious activity detection system capable of efficiently processing data accessed from databases and generating alerts for display in interactive user interfaces |
US12105719B2 (en) | 2015-08-28 | 2024-10-01 | Palantir Technologies Inc. | Malicious activity detection system capable of efficiently processing data accessed from databases and generating alerts for display in interactive user interfaces |
US11048706B2 (en) | 2015-08-28 | 2021-06-29 | Palantir Technologies Inc. | Malicious activity detection system capable of efficiently processing data accessed from databases and generating alerts for display in interactive user interfaces |
US10572487B1 (en) | 2015-10-30 | 2020-02-25 | Palantir Technologies Inc. | Periodic database search manager for multiple data sources |
US10318630B1 (en) | 2016-11-21 | 2019-06-11 | Palantir Technologies Inc. | Analysis of large bodies of textual data |
US10620618B2 (en) | 2016-12-20 | 2020-04-14 | Palantir Technologies Inc. | Systems and methods for determining relationships between defects |
US11681282B2 (en) | 2016-12-20 | 2023-06-20 | Palantir Technologies Inc. | Systems and methods for determining relationships between defects |
US10325224B1 (en) | 2017-03-23 | 2019-06-18 | Palantir Technologies Inc. | Systems and methods for selecting machine learning training data |
US10606866B1 (en) | 2017-03-30 | 2020-03-31 | Palantir Technologies Inc. | Framework for exposing network activities |
US11947569B1 (en) | 2017-03-30 | 2024-04-02 | Palantir Technologies Inc. | Framework for exposing network activities |
US11481410B1 (en) | 2017-03-30 | 2022-10-25 | Palantir Technologies Inc. | Framework for exposing network activities |
US11210350B2 (en) | 2017-05-02 | 2021-12-28 | Palantir Technologies Inc. | Automated assistance for generating relevant and valuable search results for an entity of interest |
US10235461B2 (en) | 2017-05-02 | 2019-03-19 | Palantir Technologies Inc. | Automated assistance for generating relevant and valuable search results for an entity of interest |
US11714869B2 (en) | 2017-05-02 | 2023-08-01 | Palantir Technologies Inc. | Automated assistance for generating relevant and valuable search results for an entity of interest |
US10482382B2 (en) | 2017-05-09 | 2019-11-19 | Palantir Technologies Inc. | Systems and methods for reducing manufacturing failure rates |
US11537903B2 (en) | 2017-05-09 | 2022-12-27 | Palantir Technologies Inc. | Systems and methods for reducing manufacturing failure rates |
US11954607B2 (en) | 2017-05-09 | 2024-04-09 | Palantir Technologies Inc. | Systems and methods for reducing manufacturing failure rates |
US10838987B1 (en) | 2017-12-20 | 2020-11-17 | Palantir Technologies Inc. | Adaptive and transparent entity screening |
US11119630B1 (en) | 2018-06-19 | 2021-09-14 | Palantir Technologies Inc. | Artificial intelligence assisted evaluations and user interface for same |
US10685283B2 (en) | 2018-06-26 | 2020-06-16 | Sas Institute Inc. | Demand classification based pipeline system for time-series data forecasting |
US10560313B2 (en) | 2018-06-26 | 2020-02-11 | Sas Institute Inc. | Pipeline system for time-series data forecasting |
US20210350376A1 (en) * | 2020-05-05 | 2021-11-11 | Capital One Services, Llc | Computer-based systems configured for automated activity verification based on optical character recognition models and methods of use thereof |
US11900336B2 (en) * | 2020-05-05 | 2024-02-13 | Capital One Services, Llc | Computer-based systems configured for automated activity verification based on optical character recognition models and methods of use thereof |
Also Published As
Publication number | Publication date |
---|---|
US20090192855A1 (en) | 2009-07-30 |
US20090192957A1 (en) | 2009-07-30 |
US7912773B1 (en) | 2011-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7788195B1 (en) | Computer-implemented predictive model generation systems and methods | |
US7912773B1 (en) | Computer-implemented data storage systems and methods for use with predictive model systems | |
US8600854B2 (en) | Method and system for evaluating customers of a financial institution using customer relationship value tags | |
US10552837B2 (en) | Hierarchical profiling inputs and self-adaptive fraud detection system | |
US7853469B2 (en) | Methods and systems for predicting business behavior from profiling consumer card transactions | |
US10019757B2 (en) | Total structural risk model | |
US7668769B2 (en) | System and method of detecting fraud | |
US8775301B2 (en) | Reducing risks related to check verification | |
US7853520B2 (en) | Total structural risk model | |
US7814008B2 (en) | Total structural risk model | |
US8577791B2 (en) | System and computer program for modeling and pricing loan products | |
US20150332414A1 (en) | System and method for predicting items purchased based on transaction data | |
US20090222376A1 (en) | Total structural risk model | |
US20090222378A1 (en) | Total structural risk model | |
US20090222380A1 (en) | Total structural risk model | |
US10445838B2 (en) | Automatic determination of periodic payments based on transaction information | |
Dimitras et al. | Evaluation of empirical attributes for credit risk forecasting from numerical data | |
US20060136415A1 (en) | Method, system, and program product for executing a scalar function on a varying number of records within a RDBMS using SQL | |
AU745861B2 (en) | A method and system for evaluating customers of a financial institution using customer relationship value tags | |
Ertuğrul | Customer Transaction Predictive Modeling via Machine Learning Algorithms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |