WO2018187122A1 - Identifying reason codes from gradient boosting machines - Google Patents

Identifying reason codes from gradient boosting machines Download PDF

Info

Publication number
WO2018187122A1
WO2018187122A1 PCT/US2018/024896 US2018024896W WO2018187122A1 WO 2018187122 A1 WO2018187122 A1 WO 2018187122A1 US 2018024896 W US2018024896 W US 2018024896W WO 2018187122 A1 WO2018187122 A1 WO 2018187122A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
values
classification
entity
value
Prior art date
Application number
PCT/US2018/024896
Other languages
French (fr)
Inventor
Omar ODIBAT
Claudia BARCENAS
Original Assignee
Visa International Service Association
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Visa International Service Association filed Critical Visa International Service Association
Priority to SG11201908634P priority Critical patent/SG11201908634PA/en
Priority to EP18781480.1A priority patent/EP3607475A4/en
Priority to CN201880021609.3A priority patent/CN110462607B/en
Publication of WO2018187122A1 publication Critical patent/WO2018187122A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/95Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures

Definitions

  • Gradient boosting machines can be used to build models for classification of entities using a set of previously classified entities. To classify a new entity, the values of the entity's features can be determined and those feature values can be used to traverse the classifi cation model. In contrast to certain other techniques for building classification models, gradient boosting machines can build a classification model that is an ensemble of smaller models, such as decision trees. Each of the smaller models can output a response score that depends on one or more different features of the new entity. While each of the smaller models may not be accurate in classifying new entities by itself, the classification model can provide accuracy by aggregating and weighting hundreds or thousands of smaller models.
  • a classification server perform a method for classifying an entity and identifying reason codes for the classification.
  • the classification server can use a gradient boosting machine to build a classification model using training data.
  • the classification model can be an ensemble of decision trees where each terminal node in the decision tree is associated with a response.
  • the responses from each decision tree can be aggregated by the classification server in order to determine a classification for a new entity.
  • the classification server can determine feature coninbution values based on expected feature values. These feature contribution values can be associated with each of the responses in the classification model. These feature contribution values can be used to determine reason codes for the classification of the entity.
  • the classification server can perform a single traversal of the classification model to both classify the entity and to identify reason codes.
  • FIG. 1 is a block diagram of a classification system for classifying entities, according to some embodiments.
  • FIG. 2 is a process flow diagram for building a classification model and classifying an entity, according to some embodiments.
  • FIG. 3 is a diagram of a response scoring and classification process, in accordance with some embodiments.
  • FIG. 4 is a diagram of a decision tree having terminal nodes associated with feature contribution values, in accordance with some embodiments.
  • FIG. 5 is a flow chart 500 of a method for classifying and determining reason codes, in accordance with some embodiments.
  • a "computer” or “computer server” may refer to a single computer or a cluster of computers communicating in a system.
  • the computer can be a large mainframe, a minicomputer cluster, or a group of servers functioning as a unit.
  • the computer may be a database server.
  • the computer may include any hardware, software, other logic, or combination of the preceding for processes the requests from a user interface or from one or more client computers.
  • the computer may comprise one or more
  • computational apparatuses and may use any of a variety of computing structures
  • Machine learning generally refers to a variety of different computer-implemented processes that build models based on a population of input data by determining features of the entities within the population and the relationships between the entities. To build the model, the machine learning process can measure a variety of features of each entity within the population and the features of different entities can be compared to determine
  • a machine learning process can be used to cluster entities together according to their features and the relationships between the entities.
  • "Supervised machine learning” generally refers to machine learning processes that receive training data having predetermined solutions (e.g., the data is labeled or classified).
  • a supervised machine learning process can use a set of population data and associated labels for each object in the training data and generate a set of logic to determine labels unlabeled data.
  • a supervised machine learning process can build a character recognition model using images of letters and numbers that are labeled according.
  • classifier generally refers to a description of an entity.
  • the classifier may be determined by a human. For example, a person may report that a particular transaction is "fraudulent” or “not-fraudulent.”
  • images may be labeled with the following labels based on what objects are shown in the image: "building,” “people,” “car,” “truck,” “dog,” etc.
  • One or more labeled may be applied to each entity. Entities having the same label may have one or more features having similar values.
  • features generally refers to the set of measurements for different characteristics or attributes of an entity as determined by a machine learning process.
  • the features of an entity are characteristic of that entity, such that similar entities will have similar features depending on the accuracy of the machine learning process.
  • the "features" of a transaction can include the time of the transaction, the parties involved in the transaction, the amount of the transaction.
  • the features of a transaction can be more complex, including a feature indicating the patterns of transactions conducted by a first part ⁇ ', or patterns of the other people involved in transaction with the first party.
  • “features" of an image can be determined based on color and luminance across its pixels and the distribution of different colors across the image.
  • the features determine by complex machine learning algorithms may not be understandable by humans. That is, the individual feature values may represent a certain characteristic, but this is a result of a complex algorithm and not a simple measurement that ca be easily performed by a human.
  • the features can be stored as an array of integer values. For example, the features for two different entities may be represented by the follow arrays: [0.2, 0.3, 0.1, ... ] for the first entity and [0.3, 0.4, 0. 1 , ... ] for the second entity.
  • the term "reason code” refers to an code, phrase, or narrative that identifies which features of an entity were the cause of the classification of that entity.
  • a classification system may assign a "fraudulent” classifier to a particular transaction and the reason code for that classification may identify the "transaction amount” and "address verification” features as being the reason for that classification.
  • the reason code may also include more detailed information, such as the conditions for each respective feature that caused the classification. For example, the reason code may indicate that the transaction was classified as "fraudulent” due to the transaction amount being larger than a specified threshold and the address not being verified.
  • Gradient boosting machines can be used to build models for classification of entities using a training set of previously classified entities.
  • Classification models build by gradient boosting machines can be an ensemble of hundreds or thousands of smaller sub-models, such as decision trees.
  • the classification models is complex since each of the smaller sub-models in the ensemble can depend on one or more different entity features and more than one of the smaller models can depend on the same feature. Because of this, it can be difficult or impractical to identify which features of a newly classified entity had the greatest effect on the classification outcome.
  • While classification models built by gradient boosting machines are more accurate than simpler classification models, their increased complexity makes it impractical to determine reason codes that identify which features were the cause of the classification.
  • One solution is to build a separate simpler model, that is not based on a gradient boosting machine, in order to generate reason codes.
  • new entity data When new entity data is received, it can be run through both models, the complex classification model (built using a gradient boosting machine) and the simpler reason code model.
  • Another solution is to iteratively adjust the features of the input entity data and re-run the classification model in order to determine how the adjustments changed the response.
  • this solution requires a large amount of computer resources and time in order to process the classification model several times.
  • An improved solution described in further detail below, can accurately identify reason codes for the classification while only processing the classification model once by combining both classification and reason code determination in a single model.
  • the contribution of each feature is determined for each response of each sub-model within the combined model.
  • These feature contributions can be ranked and used to identify one or more reason codes.
  • the combined model solution provides accurate reason codes, since the feature contributions are tied to the classification model, while providing real time classification, since the model only needs to be run once.
  • the combined model can be built in two steps during an "offline" phase (e.g., before the model is in operational use for entity classification).
  • a gradient boosting machine learning process can build a classification model using a set of training data.
  • classification model are determined based on the average features of the entities within the set of training data. Since the feature contributions are estimated using the averaged features, this process only needs to be performed once and it can be performed offline. That is, the feature contributions do not need to be re-determined for each new entity that is classified by the model. As such, when the classification model is used in an "online" phase (e.g., when the mode is in operational use for entity classification) it can identify both classifiers and reason codes in a single traversal of the model.
  • FIG. 1 is a block diagram of a classification system 100 for classifying entities, according to some embodiments.
  • the classification system 100 includes a classification server 110 that can classify an entity using a classification model.
  • the classification server 110 can classify an entity by traversing the classification model using entity data for that entity.
  • the classification server 1 10 can build the classification model in an offline phase using a set of training data including entity data for a plurality of entities and classification data indicating one or more classifiers that are associated with each of the plurality of entities.
  • the training data can be received from a database server 120 or it can be stored by the classification server 1 10.
  • the classification model built by the classification server 1 10 can be a combined model that can be used to determine both classifiers and reason codes.
  • the classification server 110 can use the model to determine a classification score for a new entity that is not included in the training data during an online phase.
  • the classification server 1 10 may classify a new entity upon request from a request computer 130.
  • the request computer 130 ca send a classification request message including entity data to the classification server 110.
  • the classification server 1 10 can receive the entity data from the request computer 130.
  • the entity data received from the request computer 130 may indicate values for each of the features of the entity or the classification server 110 may determine feature values for the entity based on the entity data.
  • the classification server 1 10 can traverse the classification model using the feature values to determine one or more classifiers of the entity.
  • the classifiers can be numerical scores that indicate certain classifications or they can be labels of classifications.
  • the classification server 1 10 can also determine feature contribution values for each of the entity's features.
  • the classification server 1 10 can rank the feature contribution values and then identify one or more reason codes corresponding to each of the one or more classifiers.
  • the reason codes can indicate which features were the greatest cause for the entity being classified by a particular classifier.
  • the classification server 110 can send the classifiers and reason codes to the request computer 130 in a classification response message.
  • the request computer 130 may perform different operations based on the classification of the entity. For example, the request computer 130 may deny access to a resource if a transaction entity is classified as fraudulent. In another example, the request computer 130 can use the classified image entities for object and character recognition.
  • a classification server can use a classification model to determine classifiers of an entity and reason codes for the classification using a classification model.
  • FIG. 2 is a process flow diagram 200 for building a classification model and classifying an entity, according to some embodiments.
  • a classification server can build a classification model 220 and determine feature contributions 230.
  • the classification server can operate in an online phase 212 in which it can classify a new entity.
  • the processes shown in FIG. 2 may be performed by a classification server, such as the classification server l it) of FIG. 1.
  • the classification server ca obtain training data 210.
  • the classification server can receive the training data 210 from a database server.
  • the training data 210 can include entity data for a plurality of entities.
  • the entities included in the training data 210 may be a representative sample selected from a population of entities.
  • the entity data in the training data 210 can include information describing the features or characteristics of each entity.
  • Each entity may have one or more features and the training data 210 can include feature values for each feature of the entit '.
  • the classification server may need to perform a feature extraction process in order to determine the feature values from the entity data.
  • the feature extraction process may be a machine learning algorithm that determines the determines feature values for an entity such that similar entities have similar feature values.
  • the features of a transaction entity may be determined based on the relationships between the parties involved in the transaction or a comparison of the time and location of different transactions conducted by the same party.
  • the features of an image entity can be determined based on color and luminance across its pixels and the distribution of different colors across the image.
  • the training data 210 can also include classification information associating one or more classifiers with each entity .
  • the entities in the training data 210 may have been previously classified using one or more classification labels or classification scores.
  • the classification information may associate each entity with a classification label or score indicating whether the transaction is '"fraudulent” or "non-fraudulent,"
  • the classification information may associate each entity with one or more classification labels or scores indicating the objects that are depicted in the image, such as "building,” “person,” “vehicles,” “alphanumeric characters,” etc.
  • the classification information can associate one or more classifi ers with each of the plurality of entities that is included in the training data 210.
  • the classification server can build a classification model 220 based on the training data 210, which includes feature values for each entity, and the classification information, which associates each entity with one or more classifiers.
  • the classification server can build the classification model 220 using a gradient boosting machine, which is a machine learning processes that can be used to build classification models including an ensemble of sub-models.
  • each of the sub-models can be decision trees.
  • each of the smaller sub-models can output a response score that depends on one or more different features of entity. Responses from each of the sub-models can be weighted and combined together in order to accurately classify an entity.
  • the classification model 220 is described in further detail below with respect to FIG. 3 and FIG. 4.
  • the classification server can determine feature contribution values 230 for each feature of the classification model 220.
  • the feature contribution values 230 indicate how great of an effect the value of each feature had on the outcome of the classification. These feature contribution values 230 can be ranked in order to determine which features contributed the most to classification and reason codes can be identified based on the ranking.
  • the classification server can determine the feature contribution values 230 during the offline phase 211 by determine the expected feature values for each feature by averaging feature values across each of the entities in the training data 210. These average feature values may be used to determine the feature contribution values 230 associated with each response of the classification model 220.
  • the feature contribution values 230 can be pre-determined in the offline phase 211, thereby enabling the classification server to identify reason codes based on the feature contribution values 230 in real-time during the online phase 221 using only a single traversal of the classification model 220.
  • the determination of feature contribution values is described in further detail below with respect to FIG. 4
  • the classification server can operate in an online phase 212 in which the classification server classifies new entities (not in the training data 210) and identifies one or more reason codes for each classifier of the new entity.
  • the classification server can receive new entity data 240 for a new entity from a request computer (e.g. , the request computer 130 of FIG. 1).
  • the classification server can traverse each sub-model of the classification model 220 using the feature values of the new entity data 240.
  • the classification server may determine the feature values for the new entity using a feature extraction process.
  • the feature values may have been previously determined and be included in the new entity data 240.
  • the traversal of the classification model results in a response value and one or more associated feature contribution values for each of the sub-models.
  • the response values can be aggregated in order to determine one or more classifiers 250 for the new entity.
  • the classification process is described in further detail below with respect to FIG. 3.
  • the feature contribution values associated with each of the response values can be aggregated and ranked in order to identify reason codes 260 for the classification.
  • the reason code identification process is described in further detail below with respect to FIG. 4.
  • the classification model 220 is advantageous because it combines the classification with the feature contributions such that reason codes can be identified at the time of classification, with traversing a different response code model or the same classification model multiple times.
  • the expected feature contribution values 230 are determined during the offline phase 21 1 such that they do not need to be re-calculated for each new entity being classified during the online phase 212. Since the feature contribution values 230 are pre-determined, only a single model needs to be traversed for each new entity during the online phase 212. Accordingly, the amount of time and computing resources spent for both classification and reason code identification is reduced compared to other classification systems that traverse more than one model to determine both classifiers and reason codes.
  • FIG. 3 is a diagram 300 of a response scoring and classification process, in accordance with some embodiments.
  • the response scoring process 301 determines a response score based on responses from each submodel of the classification mode.
  • the classification process 302 determines one or more classifiers based on the response score.
  • the classification model built and used by a classification server can be an ensemble of decision trees.
  • a classification model can include a first decision tree 31 1, a second decision tree 312, a third decision tree 313, a last decision tree 314, and a plurality of other decision trees (indicated by the ellipsis) that are not shown in FIG. 3 for simplicity.
  • Each decision tree of the plurality of decision trees in the classification model can contain a plurality of nodes, which are depicted as boxes in FIG. 3. The nodes can be associated with one or more features and a set of feature values for that particular feature.
  • the set of feature values for the condition may be determined using a threshold value, such that the decision at each node can branch based on whether the condition is met ("YES") or not met ("NO").
  • YES YES
  • NO NO
  • the traversal of the nodes within a decision tree is discussed in further detail below w ith respect to FIG. 4.
  • Each decision tree of the plurality of decision trees in the classification model can contain a plurality of branch containing one or more conditional nodes and a terminal node, the branches being depicted as vertices connected nodes within the branch in FIG. 3.
  • the terminal nodes being depicted as gray colored boxed in FIG. 3.
  • each of the terminal nodes is associated with a response value. These response scores are weighted based on the accuracy of the decision tree in classifying the entities in the training data.
  • the feature values for that entity are used to traverse the trees, going down a certain branch in the decision tree to a particular terminal node depending on whether the feature conditions of that branch are met.
  • the response values for each tree can be aggregated into a score.
  • the first decision tree 311 outputs a value of "Response 1”
  • the second decision tree 312 outputs a response value of "Response 2”
  • the third decision tree 313 outputs a response value of "Response 3”
  • the last decision tree 314 outputs a response value of "Response M "
  • each of the plurality of decision trees not shown in FIG. 3 also output a response value (indicated by the ellipsis).
  • a response value can be identified for each decision tree based on each terminal node hit during the traversal.
  • the classifier for an entity can be determined using a sigmoidal function based on the response score.
  • the classification function (1) below may be used to compute a classification score based on the response score.
  • This classification function (1) can be graphed as shown in FIG. 3.
  • the classification score may be less than 0.5 when the response score is a negative value, 0.5 when the response score is 0, and greater than 0.5 when the response score is a positive value.
  • the classifier for a particular entit ' can be determined using a threshold value 321 (e.g., 0.5). For example, if the classification function (1) results in a classification score that is less than 0.5, the entity can be associated with the classifier "Class 1.” And if the classification function (1 ) results in a classification score greater than or equal to 0.5, then the entity can be associated with the classifier "Class 2.” In some embodiments, more than one threshold value can be used to select between more than two classifiers.
  • the terminal nodes of the decision trees in a classification model can each be associated with a response value.
  • each of the terminal nodes can be associated with one or more feature contribution values that can be used to identify reason codes.
  • the reason codes can be identified in real time, using a single model.
  • FIG. 4 is a diagram 400 of a decision tree 410 having terminal nodes associated with feaiure contribution values 420, in accordance with some embodiments.
  • a gradient boosting machine process can built a classification model that is an ensemble of hundreds or thousands of decision trees.
  • the decision tree 410 is an example of a single decision tree within the classification model.
  • This decision tree may be traversed by the classification server, in addition to other decision trees of the classification model, when classifying a new entity during an online phase.
  • the decision tree 410 may have been built using training data for transaction entities that have been pre-classified as "fraudulent” or "non- fraudulent.”
  • the features of the transaction entities can include an Internet Protocol (IP) reputation score ("IP Score") that has been pre-determined by a third party.
  • IP Score Internet Protocol
  • greater IP Score feature values e.g., greater than 30
  • IP Score feature values e.g., not greater than 30
  • the features of the transaction can also include an "Amount” feature value indicating the amount of the transaction.
  • amount values e.g., less than 95
  • greater amount values e.g., not less than 95
  • the features of the transaction can also include "Address Verification Service Match” feature indicating whether the a verification server has matched the address used to conduct the transaction with a registered address.
  • the Address Verification Service (AVS) match (“yes") may indicate that the transaction is more likely to be classified as "non -fraudulent” while the AVS not matching (“no") may indicate that the transaction is more likely to be classified as "fraudulent.”
  • each terminal node of the decision tree 10 is associated with a response score, indicated by the value within the terminal nodes.
  • the response scores are based on the pre-determined classifications of entities having the features of the nodes within the branch of that terminal node.
  • an entity having an IP Score feature value that is greater than 30 and an Amount feature value that is less than 95 will cause the decision tree 410 to output the response of 0.2 while an entity having an IP Score feature value that is not greater (less) than 30 and an Amount feature value that is less than 95 will cause the decision tree to output a response of 0.5.
  • an entity having an AVS Matched feature value of "Yes” and an Amount feature value that is not less (greater) than 95 will cause the decision tree to output the response value of 0.4 while an entity having an AVS Matched feature value of "No” and an Amount feature value that is not less (greater) than 95 will cause the decision tree to output the response value of 0.7.
  • the classification model may assign positive response values to terminal nodes that have a set of feature values that are more likely to be classified as "fraudulent” and negative values to terminal nodes that have a set of feature values that are more likely to be cl assified as "non-fraudulent” based on the number of entities classified as such in the training data.
  • the classification server can determine one or more feature contribution values 420 for each of the response values (e.g., for each terminal node).
  • the classification server can determine feature contribution values for each of the features that a particular branch is based on. For example, the far-right branch having a response value of 0.7 is based on the Amount feature, and the AVS Matched feature. Accordingly, the classification server can determine feature values for the Amount feature and the AVS Matched feature.
  • the classification server can determine the feature contribution values based on an expected feature value.
  • the feature contribution values 420 can also be based on the particular features position within the tree and the percentage of entities within the training data that meet the conditions of the particular branch.
  • the classification server can determine the average value of the feature across all of the entities in the training data. In this example, the classification server can determine that the average IP Score feature value is 60, the average Amount feature value is 60, and the majority of the entities have the AVS Matched feature value of "Yes.” These expected feature values are shown in nodes of the decision tree 410.
  • the classification server can use the expected feature values to determine the feature contribution values 420 to be associated with each terminal node.
  • the feature contribution values are also based on the percentage of entities that are expected to meet the conditions of that branch using the expected feature values.
  • the classification server can identify the node in the decision tree that corresponds to that feature. Then, the classification server can select one of the branches of that node would be followed using the expected feature value for that feature. Then, the classification server can identify each terminal node that is within the selected branch.
  • the classification server can then adjust the response values for each terminal node within the selected branch based on the percentage of entities within the training set that both meet the condition of the node in the decision tree that corresponds to the feature (e.g. , the entities that would follow the branch of the node selected using the expected feature value) and that would hit that particular terminal node. For example, if 20% of entities that would follow the selected branch would end at a particular terminal node, then the response value for that terminal node can be multiplied by 20%».
  • the adjusted response values for each of the terminal nodes within the selected branch can be summed, and the summation of the terminal nodes within the selected branch (as adjusted) can be subtracted from the response value of the first terminal node itself.
  • the difference between the response value of the firs t terminal node and the summation of the adjusted response values of the terminal nodes within the branch selected by the expected feature value is the expected feature contribution value for that particular feature.
  • the expected feature contribution value for a feature indicates the amount of deviation between the feature value of the first terminal node from the expected feature value, thereby indicating the amount that the value for that feature contributed to the response value.
  • the classification server may determine the difference between the current response and the expected response.
  • the expected AVS Matched feature value is '"Yes.”
  • the terminal node having a response of 0.7 is hit when the AVS Matched feature value is "No.” Since the AVS Matched feature value is different than the expected value, then the AVS Matched feature is a cause of the response score being high (e.g., being 0.7). According, the feature contribution value for the AVS Matched feature for this terminal node will be greater than 0.
  • the classification server can use the response value of the terminal node (e.g., 0.7), the expected feature value for AVS Matched (e.g., " 'Yes") and the percentage of entities in the training data that would hit that terminal node (e.g., meet the conditions of the branch) based on the expected feature value for AVS matched (e.g., 0%). Since the expected feature value for AVS Matched is "Yes,” then an expected entity would not hit the terminal node having the response of 0.7. Thus, 0% of expected entities would hit the terminal node having the response of 0.7 and 100% of expected entities would hit the terminal node having the response of 0.4 where the A VS Matched feature value is 0.1.
  • the feature contribution value of AVS Matched for the terminal node having a 0.7 response value can he determined by multiplying each opposing response score by the percentage of entities expected to hit that response score and subtracting these two values from the response score for that terminal node.
  • the classification server can use the expected (e.g., average) feature values (IP Score is 60, Amount is 60, and AVS Matched is "Yes") to determine the percentage of entities expected to hit a response score.
  • the classification server can identify the percentage of entities within the training data have feature values that meet the feature conditions of the branch.
  • the percentage of entities expected to hit the response score of 0.4 (AVS Matched is YES) is 100% and the percentage of entities expected to hit the response score of 0.7 (AVS Matched is NO) is 0% are shown by the dashed arrows in FIG. 4.
  • the AVS Matched feature contribution value for the terminal node having a 0.7 response value can be computed using formula (2) below.
  • the feature contribution value of AVS Matched for the terminal node having a 0.4 (AVS Matches is YES) can be computed using formula (3) below.
  • the feature contribution value for AVS Matched is 0.0 for the node having a response of 0.4 because the AVS Matched feature value is expected to be "Yes" and the percentage of entities hitting that that terminal node (response 0.4) is 100%. As such, the AVS being matched is expected. Therefore, the AVS Matched being YES does not contribute to the response score being 0.4 since the AVS is expected to match.
  • the classification server may determine that 80% of entities that an IP Score that is greater than 30 have an Amount feature value is less than 95. Accordingly, 20% of entities that have an IP Score that is not greater (less) than 30 have an Amount feature value that is not less than 95. Accounting, the classification server can determine the Amount feature contribution values for each of the terminal nodes based on these percentages. For example, the classification server can determine the Amount feature contribution value for the terminal node having the response value of 0.7 using formula (4) below.
  • the classification server can determine the Amount feature contribution value for the terminal node having the response value of 0.4 using formula (5) below.
  • the Amount feature contribution is a positive value, indicating that the amount feature value contributed to the response value being 0.4.
  • the feature contribution values for the two other terminal nodes can be computed similarly.
  • the IP Score feature contribution value for the terminal node having the response value of 0.2 can be determined using formula (6) below.
  • IP Score Feature Contribution 0.2 - (100% * 0,2 + 0% * 0.5 ) - 0 (6)
  • Amount feature contribution value for the terminal node having the response value of 0.2 can be determined using formula (7) below.
  • the Amount feature contribution value being negative for the response of 0.2 indicates that the Amount value negatively contributed to the response value, reducing the response value comparatively.
  • the IP Score feature contribution value for the terminal node having the response value of 0.5 can be determined using formula (8) below.
  • IP Score Feature Contribution 0.5 - (100% * 0.2 + 0% * 0.5 ) - 0.3 (8)
  • the Amount feature contribution value for the terminal node having the response value of 0.5 can be determined using formula (9) below.
  • the Amount feature contribution value being negative for the response of 0.2 indicates that the Amount value negatively contributed to the response value, reducing the response value comparatively.
  • classification server can identify the feature contribution values that associated with each of the terminal nodes hit during the traversal of the cl assification model using the feature values of a particular entity.
  • the feature contribution values for each feature can be summed across all of the decision trees in the classification model and a certain number of the top ranking feature contributions values can be selected to use for reason codes. For example, looking only at the decision tree 410 in FIG.
  • the Amount feature contribution values for each of the terminal nodes that are hit in the classification model can be summed together before the feature
  • the classification server can determine the both the response of each decision tree within the classification server and the feature contribution values for each response which can be used to identify reason codes.
  • the classification model is a combined classification and reason code identification model.
  • the combined classification and reason code identification model can determine classifiers and reason codes for the classification using only a single traversal of the tree since the feature contribution values are based on the expected feature values. This combined model provides accurate reason code while reducing computation time since the reason codes are determined from a single traversal of the model.
  • FIG. 5 is a flow chart 500 of a method for classifying and determining reason codes, in accordance with some embodiments.
  • This method can be performed by a classification server.
  • the classification server can obtain training data for a plurality of entities.
  • each entity of the plurality of entities characterized by a plurality of ieatures that characterize that entity and the entity data for a particular entity can indicate feature values for each feature of that entity.
  • This step can be performed during on offline phase.
  • the classification server can obtain classification data associated with each entity of the plurality of entities in the training data.
  • the classification data may be included with the training data in some instances.
  • the classification data can associate a plurality of different classifiers with the plurality of entities such that each entity of the plurality of enti ties is associated with one or more of the classifiers. This step can be performed during an offline phase.
  • the classification server can build a classification model using the training data and the classification data.
  • the classification model can be built using a gradient boosting machine.
  • the classification model can include a plurality of decision trees for selecting the one or more classifiers.
  • the classification model can be an ensemble of more than a thousand decision trees.
  • Each of the decision trees can contain a plurality of branches where each branch contains one or more conditional nodes and a terminal node.
  • the conditional nodes can be associated with a particular feature (e.g., "Amount") and a set of feature values (e.g., "Amount ⁇ 95") for that particular feature.
  • Each of the terminal nodes can be associated with a response value. This step can be performed during an offline phase.
  • the classification server can determine a response value for each terminal node of the decision trees in the classification model.
  • the response values may be determined as part of the creation of the classification model using the gradient boosting machine process. This step can be performed during an offline phase.
  • the classification server can determine expected feature values for each feature.
  • the expected feature value for a particular feature based on the feature values of that feature for each entity of the plurality of entities.
  • the expected feature value for a particular feature can be the average value across all of the entities in the training data. This step can be performed during an offline phase.
  • the classification server can determine feature contribution values for each terminal node in the decision trees of the classification model.
  • the feature contribution values can be based on the expected feature value for that feature, the response value for that terminal node, and the positioning of the feature within the decision tree.
  • the feature contribution value for a particular feature can be based on the difference between a first response value of a first terminal node included in a first branch having a first condition based o that particular feature and a second response value of a second terminal node included in a second branch having a second condition based on that particular feature.
  • the determining expected feature values used in calculating the expected contribution values can be based on the feature values of that feature for each entity of the plurality of entities. For example, the average feature value across all of the entities can be used as the expected feature value for a particular feature. This step can be performed during an offline phase.
  • the classification server can determine the feature contribution values based on the response values associated with a particular terminal node and the expected values for each feature associated with conditional nodes of the branch within which the particular terminal node is included.
  • the feature-contribution values that are associated with that particular terminal node can include a contribution value for each feature that is associated with the conditional nodes of the branch within which the particular terminal node is included.
  • the classification server can receive new entity data for a new entity .
  • the new entity data may be received from a request computer.
  • the new entity data may indicate feature values for each feature of the entity.
  • the classification server may determine the features of the entity using a feature extraction process. This step can be performed during an online phase.
  • the classification server can traverse the classification model using the feature values for the new entity.
  • the classification server can select a plurality of terminal nodes based on whether the entities features meet the conditions of the branch that includes those terminal nodes.
  • the classification server can then determine a response value for each decision tree within the classification model and identify feature contribution values that are associated with the response values (e.g., associated with the terminal nodes that are associated with that response value). This step can be performed during an online phase.
  • the classification server classify the new entity based on the response values. For example, the classification server can determine one or more classifiers based on the aggregated response scores using a sigmoidal function and one or more threshold values. This step can be performed during an online phase.
  • the classification server can identify a reason code for each of the classifiers. The reason code may be a label indicating the features of the entity that were the greatest cause for the classification of the entity. The reason codes ca be sent to the requesting computer in some cases. Tins step can be performed during an online phase. VL EXAMPLE COMPUTER SYSTEMS
  • Subsystems may include a printer, keyboard, fixed disk (or other memory comprising computer readable media), monitor, which is coupled to display adapter, and others.
  • Peripherals and input/output (I/O) devices which couple to an I/O controller (which can be a processor or other suitable controller), can be connected to the computer system by any number of means known in the art, such as a serial port.
  • a serial port or an external interface can be used to connect the computer apparatus to a wide area network such as the Internet, a mouse input device, or a scanner.
  • the interconnection via the system bus allows the central processor to communicate with each subsystem and to control the execution of instructions from system memory or the fixed disk, as well as the exchange of information between subsystems.
  • the system memory and/or the fixed disk may embody a computer readable medium.
  • the embodiments may involve implementing one or more functions, processes, operations or method steps.
  • the functions, processes, operations or method steps may be implemented as a result of the execution of a set of instructions or software code by a suitably -programmed computing device, microprocessor, data processor, or the like.
  • the set of instructions or software code may be stored in a memory or other form of data storage element which is accessed by the computing device, microprocessor, etc.
  • the functions, processes, operations or method steps may be implemented by firmware or a dedicated processor, integrated circuit, etc.
  • the present invention as described above can be implemented in the form of control logic using computer software in a modular or integrated manner. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will know and appreciate other ways and/or methods to implement the present systems and methods using hardware and a combination of hardware and software.
  • any of the software components or functions described in this application may be implemented as software code to be executed by a processor using any suitable computer language such as, for example, Java, C++ or Perl using, for example, conventional or object- oriented techniques.
  • the software code may be stored as a series of instructions, or commands on a computer readable medium, such as a random access memory (RAM), a read only memory (ROM), a magnetic medium such as a hard-drive or a floppy disk, or an optical medium such as a CD-ROM.
  • the computer readable medium may be any combination of such storage or transmission devices.
  • Such programs may also be encoded and transmitted using carrier signals adapted for transmission via wired, optical, and/or wireless networks conforming to a variety of protocols, including the Internet.
  • a computer readable medium according to an embodiment of the present invention may be created using a data signal encoded with such programs.
  • Computer readable media encoded with the program code may be packaged with a compatible device or provided separately from other devices (e.g., via internet download). Any such computer readable medium may reside on or within a single computer product (e.g. a hard drive, a CD, or an entire computer system), and may be present on or within different computer products within a system or network.
  • a computer system may include a monitor, printer, or other suitable display for providing any of the results mentioned herein to a user.
  • any of the methods described herein may be totally or partially performed with a computer system including one or more processors, which can be configured to perform the steps.
  • embodiments can be directed to computer systems configured to perform the steps of any of the methods described herein, potentially with different components performing a respective steps or a respective group of steps.
  • steps of methods herein can be performed at a same time or in a different order. Additionally, portions of these steps may be used with portions of other steps from other methods. Also, all or portions of a step may be optional. Additionally, any of the steps of any of the methods can be performed with modules, units, circuits, or other means for performing these steps.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Prostheses (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)

Abstract

A classification server perform a method for classifying an entity and identifying reason codes for the classification. The classification server can use a gradient boosting machine to build a classification model using training data. The classification model can be an ensemble of decision trees where each terminal node in the decision tree is associated with a response. The responses from each decision tree can be aggregated by the classification server in order to determine a classification for a new entity. The classification server can determine feature contribution values based on expected feature values. These feature contribution values can be associated with each of the responses in the classification model. These feature contribution values can be used to determine reason codes for the classification of the entity. As such, the classification server can perform a single traversal of the classification model to classify the entity and identify reason codes.

Description

IDENTIFYING REASO CODES FROM GRADIENT BOOSTING
MACHINES
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application is an international patent application which claims the benefit of the filing date of U.S. Patent Application No. 15/482,489 filed April 7, 2017, entitled
IDENTIFYING REASON CODES FROM GRADIENT BOOSTING MACHINES, which is herein incorporated by reference in its entirety for all purposes.
BACKGROUND
[0002] Gradient boosting machines can be used to build models for classification of entities using a set of previously classified entities. To classify a new entity, the values of the entity's features can be determined and those feature values can be used to traverse the classifi cation model. In contrast to certain other techniques for building classification models, gradient boosting machines can build a classification model that is an ensemble of smaller models, such as decision trees. Each of the smaller models can output a response score that depends on one or more different features of the new entity. While each of the smaller models may not be accurate in classifying new entities by itself, the classification model can provide accuracy by aggregating and weighting hundreds or thousands of smaller models.
[0003] While gradient boosting machines can build accurate classification models, it can be difficult or impractical to identify which features had the greatest effect on the classification outcome. One cause of the difficulty in determining the classification reasons is the composition of the classification model, which can include hundreds or thousands of smaller models, where each of the smaller models can depend on more than one feature, and more than one of the smaller models can depend on the same feature. Accordingly, there is a need for improved processes for determining reason codes from gradient boosting machines. SUMMARY
[0004] A classification server perform a method for classifying an entity and identifying reason codes for the classification. The classification server can use a gradient boosting machine to build a classification model using training data. The classification model can be an ensemble of decision trees where each terminal node in the decision tree is associated with a response. The responses from each decision tree can be aggregated by the classification server in order to determine a classification for a new entity. In addition, the classification server can determine feature coninbution values based on expected feature values. These feature contribution values can be associated with each of the responses in the classification model. These feature contribution values can be used to determine reason codes for the classification of the entity. As such, the classification server can perform a single traversal of the classification model to both classify the entity and to identify reason codes.
[0005] Other embodiments are directed to systems, portable consumer devices, and computer readable media associated with methods described herein. [0006] A better understanding of the nature and advantages of embodiments of the present invention may be gained with reference to the following detailed description and the accompanying drawings.
DESCRIPTION OF THE DRAWINGS
[0007] FIG. 1 is a block diagram of a classification system for classifying entities, according to some embodiments.
[0008] FIG. 2 is a process flow diagram for building a classification model and classifying an entity, according to some embodiments.
[0009] FIG. 3 is a diagram of a response scoring and classification process, in accordance with some embodiments. [0010] FIG. 4 is a diagram of a decision tree having terminal nodes associated with feature contribution values, in accordance with some embodiments.
[0011] FIG. 5 is a flow chart 500 of a method for classifying and determining reason codes, in accordance with some embodiments.
TERMS [0012] A "computer" or "computer server" may refer to a single computer or a cluster of computers communicating in a system. For example, the computer can be a large mainframe, a minicomputer cluster, or a group of servers functioning as a unit. In one exampl e, the computer may be a database server. The computer may include any hardware, software, other logic, or combination of the preceding for processes the requests from a user interface or from one or more client computers. The computer may comprise one or more
computational apparatuses and may use any of a variety of computing structures,
arrangements, and compilations for servicing the requests from one or more client computers,
[0013] "Machine learning" generally refers to a variety of different computer-implemented processes that build models based on a population of input data by determining features of the entities within the population and the relationships between the entities. To build the model, the machine learning process can measure a variety of features of each entity within the population and the features of different entities can be compared to determine
segmentations. For example, a machine learning process can be used to cluster entities together according to their features and the relationships between the entities. "Supervised machine learning" generally refers to machine learning processes that receive training data having predetermined solutions (e.g., the data is labeled or classified). A supervised machine learning process can use a set of population data and associated labels for each object in the training data and generate a set of logic to determine labels unlabeled data. For example, a supervised machine learning process can build a character recognition model using images of letters and numbers that are labeled according.
[0014] The term "classifier" generally refers to a description of an entity. The classifier may be determined by a human. For example, a person may report that a particular transaction is "fraudulent" or "not-fraudulent." In another example, images may be labeled with the following labels based on what objects are shown in the image: "building," "people," "car," "truck," "dog," etc. One or more labeled may be applied to each entity. Entities having the same label may have one or more features having similar values.
[0015] The term "features" generally refers to the set of measurements for different characteristics or attributes of an entity as determined by a machine learning process. As such, the features of an entity are characteristic of that entity, such that similar entities will have similar features depending on the accuracy of the machine learning process. For example, the "features" of a transaction can include the time of the transaction, the parties involved in the transaction, the amount of the transaction. In addition, the features of a transaction can be more complex, including a feature indicating the patterns of transactions conducted by a first part}', or patterns of the other people involved in transaction with the first party. In another example, "features" of an image can be determined based on color and luminance across its pixels and the distribution of different colors across the image. The features determine by complex machine learning algorithms may not be understandable by humans. That is, the individual feature values may represent a certain characteristic, but this is a result of a complex algorithm and not a simple measurement that ca be easily performed by a human. The features can be stored as an array of integer values. For example, the features for two different entities may be represented by the follow arrays: [0.2, 0.3, 0.1, ... ] for the first entity and [0.3, 0.4, 0. 1 , ... ] for the second entity.
[0016] The term "reason code" refers to an code, phrase, or narrative that identifies which features of an entity were the cause of the classification of that entity. For example, a classification system may assign a "fraudulent" classifier to a particular transaction and the reason code for that classification may identify the "transaction amount" and "address verification" features as being the reason for that classification. The reason code may also include more detailed information, such as the conditions for each respective feature that caused the classification. For example, the reason code may indicate that the transaction was classified as "fraudulent" due to the transaction amount being larger than a specified threshold and the address not being verified.
DETAILED DESCRIPTION
[0017] Gradient boosting machines can be used to build models for classification of entities using a training set of previously classified entities. Classification models build by gradient boosting machines can be an ensemble of hundreds or thousands of smaller sub-models, such as decision trees. The classification models is complex since each of the smaller sub-models in the ensemble can depend on one or more different entity features and more than one of the smaller models can depend on the same feature. Because of this, it can be difficult or impractical to identify which features of a newly classified entity had the greatest effect on the classification outcome. [0018] While classification models built by gradient boosting machines are more accurate than simpler classification models, their increased complexity makes it impractical to determine reason codes that identify which features were the cause of the classification. One solution is to build a separate simpler model, that is not based on a gradient boosting machine, in order to generate reason codes. When new entity data is received, it can be run through both models, the complex classification model (built using a gradient boosting machine) and the simpler reason code model. However, this solution can be inaccurate as a result of differences between the two separate models. Another solution is to iteratively adjust the features of the input entity data and re-run the classification model in order to determine how the adjustments changed the response. However, this solution requires a large amount of computer resources and time in order to process the classification model several times. [0019] An improved solution, described in further detail below, can accurately identify reason codes for the classification while only processing the classification model once by combining both classification and reason code determination in a single model. To create the combined model, the contribution of each feature is determined for each response of each sub-model within the combined model. These feature contributions can be ranked and used to identify one or more reason codes. As such, the combined model solution provides accurate reason codes, since the feature contributions are tied to the classification model, while providing real time classification, since the model only needs to be run once.
[0020] The combined model can be built in two steps during an "offline" phase (e.g., before the model is in operational use for entity classification). In the first step, a gradient boosting machine learning process can build a classification model using a set of training data. In the second step, estimated feature contributions for each response of the
classification model are determined based on the average features of the entities within the set of training data. Since the feature contributions are estimated using the averaged features, this process only needs to be performed once and it can be performed offline. That is, the feature contributions do not need to be re-determined for each new entity that is classified by the model. As such, when the classification model is used in an "online" phase (e.g., when the mode is in operational use for entity classification) it can identify both classifiers and reason codes in a single traversal of the model.
[0021] By estimating the feature contributions before the combined model is in operational use, some of the complex and computing-resource intensive calculations can be performed in the offl ine phase instead of the online phase. Furthermore, the feature contribution determination process only needs to be performed once for a particular classification model. As such, the amount of time and computing resources used to classify an entity and identify the corresponding reason codes is reduced using the combined model compared to other solutions. The improved combined model is described in further detail below with reference to FIGs. 1-5. L CLASSIFICATION SYSTEM: DIAGRAM
[0022] FIG. 1 is a block diagram of a classification system 100 for classifying entities, according to some embodiments. The classification system 100 includes a classification server 110 that can classify an entity using a classification model. The classification server 110 can classify an entity by traversing the classification model using entity data for that entity. The classification server 1 10 can build the classification model in an offline phase using a set of training data including entity data for a plurality of entities and classification data indicating one or more classifiers that are associated with each of the plurality of entities. The training data can be received from a database server 120 or it can be stored by the classification server 1 10. The classification model built by the classification server 1 10 can be a combined model that can be used to determine both classifiers and reason codes.
[0023] After building the classification model offline, the classification server 110 can use the model to determine a classification score for a new entity that is not included in the training data during an online phase. For instance, the classification server 1 10 may classify a new entity upon request from a request computer 130. The request computer 130 ca send a classification request message including entity data to the classification server 110. The classification server 1 10 can receive the entity data from the request computer 130. The entity data received from the request computer 130 may indicate values for each of the features of the entity or the classification server 110 may determine feature values for the entity based on the entity data. The classification server 1 10 can traverse the classification model using the feature values to determine one or more classifiers of the entity. The classifiers can be numerical scores that indicate certain classifications or they can be labels of classifications.
[0024] The classification server 1 10 can also determine feature contribution values for each of the entity's features. The classification server 1 10 can rank the feature contribution values and then identify one or more reason codes corresponding to each of the one or more classifiers. The reason codes can indicate which features were the greatest cause for the entity being classified by a particular classifier. The classification server 110 can send the classifiers and reason codes to the request computer 130 in a classification response message. [0025] The request computer 130 may perform different operations based on the classification of the entity. For example, the request computer 130 may deny access to a resource if a transaction entity is classified as fraudulent. In another example, the request computer 130 can use the classified image entities for object and character recognition.
II. CLASSIFICATION MODEL GENERATION AND OPERATION
[0026] A classification server can use a classification model to determine classifiers of an entity and reason codes for the classification using a classification model. FIG. 2 is a process flow diagram 200 for building a classification model and classifying an entity, according to some embodiments. During an offline phase 21 1 , a classification server can build a classification model 220 and determine feature contributions 230. After the offline phase 211, the classification server can operate in an online phase 212 in which it can classify a new entity. The processes shown in FIG. 2 may be performed by a classification server, such as the classification server l it) of FIG. 1.
[0027] During the offline phase 211, the classification server ca obtain training data 210. For example, the classification server can receive the training data 210 from a database server. The training data 210 can include entity data for a plurality of entities. The entities included in the training data 210 may be a representative sample selected from a population of entities. The entity data in the training data 210 can include information describing the features or characteristics of each entity. Each entity may have one or more features and the training data 210 can include feature values for each feature of the entit '.
[0028] In some situations, the classification server may need to perform a feature extraction process in order to determine the feature values from the entity data. The feature extraction process may be a machine learning algorithm that determines the determines feature values for an entity such that similar entities have similar feature values. In one example, the features of a transaction entity may be determined based on the relationships between the parties involved in the transaction or a comparison of the time and location of different transactions conducted by the same party. In another example, the features of an image entity can be determined based on color and luminance across its pixels and the distribution of different colors across the image.
[0029] The training data 210 can also include classification information associating one or more classifiers with each entity . For instance, the entities in the training data 210 may have been previously classified using one or more classification labels or classification scores. For example, if the entities in the training data 210 are transactions, then the classification information may associate each entity with a classification label or score indicating whether the transaction is '"fraudulent" or "non-fraudulent," In another example, if the entities in the training data 210 are images, then the classification information may associate each entity with one or more classification labels or scores indicating the objects that are depicted in the image, such as "building," "person," "vehicles," "alphanumeric characters," etc. As such, the classification information can associate one or more classifi ers with each of the plurality of entities that is included in the training data 210.
[0030] At 201, the classification server can build a classification model 220 based on the training data 210, which includes feature values for each entity, and the classification information, which associates each entity with one or more classifiers. The classification server can build the classification model 220 using a gradient boosting machine, which is a machine learning processes that can be used to build classification models including an ensemble of sub-models. For example, each of the sub-models can be decision trees. In the classification model 220, each of the smaller sub-models can output a response score that depends on one or more different features of entity. Responses from each of the sub-models can be weighted and combined together in order to accurately classify an entity. The classification model 220 is described in further detail below with respect to FIG. 3 and FIG. 4.
[0031] At 202, after the classification model 220 has been built, the classification server can determine feature contribution values 230 for each feature of the classification model 220. The feature contribution values 230 indicate how great of an effect the value of each feature had on the outcome of the classification. These feature contribution values 230 can be ranked in order to determine which features contributed the most to classification and reason codes can be identified based on the ranking. [0032] The classification server can determine the feature contribution values 230 during the offline phase 211 by determine the expected feature values for each feature by averaging feature values across each of the entities in the training data 210. These average feature values may be used to determine the feature contribution values 230 associated with each response of the classification model 220. By determining the expected feature value, the feature contribution values 230 can be pre-determined in the offline phase 211, thereby enabling the classification server to identify reason codes based on the feature contribution values 230 in real-time during the online phase 221 using only a single traversal of the classification model 220. The determination of feature contribution values is described in further detail below with respect to FIG. 4
[0033] After the classification model 220 has been built and the feature contribution values have been determined, the classification server can operate in an online phase 212 in which the classification server classifies new entities (not in the training data 210) and identifies one or more reason codes for each classifier of the new entity. For example, the classification server can receive new entity data 240 for a new entity from a request computer (e.g. , the request computer 130 of FIG. 1).
[0034] At 203, the classification server can traverse each sub-model of the classification model 220 using the feature values of the new entity data 240. In some situations, the classification server may determine the feature values for the new entity using a feature extraction process. In other situations, the feature values may have been previously determined and be included in the new entity data 240.
[0035] The traversal of the classification model results in a response value and one or more associated feature contribution values for each of the sub-models. The response values can be aggregated in order to determine one or more classifiers 250 for the new entity. The classification process is described in further detail below with respect to FIG. 3. The feature contribution values associated with each of the response values can be aggregated and ranked in order to identify reason codes 260 for the classification. The reason code identification process is described in further detail below with respect to FIG. 4.
[0036] The classification model 220 is advantageous because it combines the classification with the feature contributions such that reason codes can be identified at the time of classification, with traversing a different response code model or the same classification model multiple times. As discussed above, the expected feature contribution values 230 are determined during the offline phase 21 1 such that they do not need to be re-calculated for each new entity being classified during the online phase 212. Since the feature contribution values 230 are pre-determined, only a single model needs to be traversed for each new entity during the online phase 212. Accordingly, the amount of time and computing resources spent for both classification and reason code identification is reduced compared to other classification systems that traverse more than one model to determine both classifiers and reason codes. III. CLASSIFICATION MODEL RESPONSE SCORING
[0037] As discussed above, a classification server can build a classification model that can be traversed in order to determine classifiers for an entit '. FIG. 3 is a diagram 300 of a response scoring and classification process, in accordance with some embodiments. The response scoring process 301 determines a response score based on responses from each submodel of the classification mode. The classification process 302 determines one or more classifiers based on the response score.
[0038] As discussed above, the classification model built and used by a classification server can be an ensemble of decision trees. As shown in FIG. 3, a classification model can include a first decision tree 31 1, a second decision tree 312, a third decision tree 313, a last decision tree 314, and a plurality of other decision trees (indicated by the ellipsis) that are not shown in FIG. 3 for simplicity. Each decision tree of the plurality of decision trees in the classification model can contain a plurality of nodes, which are depicted as boxes in FIG. 3. The nodes can be associated with one or more features and a set of feature values for that particular feature. The set of feature values for the condition may be determined using a threshold value, such that the decision at each node can branch based on whether the condition is met ("YES") or not met ("NO"). The traversal of the nodes within a decision tree is discussed in further detail below w ith respect to FIG. 4.
[0039] Each decision tree of the plurality of decision trees in the classification model can contain a plurality of branch containing one or more conditional nodes and a terminal node, the branches being depicted as vertices connected nodes within the branch in FIG. 3. The terminal nodes being depicted as gray colored boxed in FIG. 3. As a result of building the decision trees using a gradient boosting machine, each of the terminal nodes is associated with a response value. These response scores are weighted based on the accuracy of the decision tree in classifying the entities in the training data.
[0040] When new entity data is received, the feature values for that entity are used to traverse the trees, going down a certain branch in the decision tree to a particular terminal node depending on whether the feature conditions of that branch are met. The response values for each tree can be aggregated into a score. In the example shown in FIG. 3, the first decision tree 311 outputs a value of "Response 1," the second decision tree 312 outputs a response value of "Response 2," the third decision tree 313 outputs a response value of "Response 3," the last decision tree 314 outputs a response value of "Response M " and each of the plurality of decision trees not shown in FIG. 3 also output a response value (indicated by the ellipsis). As such, a response value can be identified for each decision tree based on each terminal node hit during the traversal.
[0041] The classifier for an entity can be determined using a sigmoidal function based on the response score. For example, the classification function (1) below may be used to compute a classification score based on the response score.
/ (eSci, e + e-&ore) (1)
This classification function (1) can be graphed as shown in FIG. 3. According to the classification function (1), the classification score may be less than 0.5 when the response score is a negative value, 0.5 when the response score is 0, and greater than 0.5 when the response score is a positive value. The classifier for a particular entit ' can be determined using a threshold value 321 (e.g., 0.5). For example, if the classification function (1) results in a classification score that is less than 0.5, the entity can be associated with the classifier "Class 1." And if the classification function (1 ) results in a classification score greater than or equal to 0.5, then the entity can be associated with the classifier "Class 2." In some embodiments, more than one threshold value can be used to select between more than two classifiers.
IV. REASON CODE IDENTIFICATION
[0042] The terminal nodes of the decision trees in a classification model can each be associated with a response value. In addition, each of the terminal nodes can be associated with one or more feature contribution values that can be used to identify reason codes. As such, the reason codes can be identified in real time, using a single model. FIG. 4 is a diagram 400 of a decision tree 410 having terminal nodes associated with feaiure contribution values 420, in accordance with some embodiments. [0043] As discussed above, a gradient boosting machine process can built a classification model that is an ensemble of hundreds or thousands of decision trees. The decision tree 410 is an example of a single decision tree within the classification model. This decision tree may be traversed by the classification server, in addition to other decision trees of the classification model, when classifying a new entity during an online phase. [0044] In this example, the decision tree 410 may have been built using training data for transaction entities that have been pre-classified as "fraudulent" or "non- fraudulent." The features of the transaction entities can include an Internet Protocol (IP) reputation score ("IP Score") that has been pre-determined by a third party. In this example, greater IP Score feature values (e.g., greater than 30) may indicate that the transaction is more likely to be classified as "non-fraudulent" and lower IP Score feature values (e.g., not greater than 30) may indicate that the transaction is more likely to be classified as "fraudulent."
[0045] The features of the transaction can also include an "Amount" feature value indicating the amount of the transaction. In this example, lower amount values (e.g., less than 95) may indicate that a transaction is more likely to be classified as "non-fraudulent" while greater amount values (e.g., not less than 95) may indicate that the transaction is more likely to be classified as "fraudulent."
[0046] The features of the transaction can also include "Address Verification Service Match" feature indicating whether the a verification server has matched the address used to conduct the transaction with a registered address. In this example, the Address Verification Service (AVS) match ("yes") may indicate that the transaction is more likely to be classified as "non -fraudulent" while the AVS not matching ("no") may indicate that the transaction is more likely to be classified as "fraudulent."
[0047] As shown in FIG. 4, each terminal node of the decision tree 10 is associated with a response score, indicated by the value within the terminal nodes. The response scores are based on the pre-determined classifications of entities having the features of the nodes within the branch of that terminal node. In this example, an entity having an IP Score feature value that is greater than 30 and an Amount feature value that is less than 95 will cause the decision tree 410 to output the response of 0.2 while an entity having an IP Score feature value that is not greater (less) than 30 and an Amount feature value that is less than 95 will cause the decision tree to output a response of 0.5.
[0048] In this example, an entity having an AVS Matched feature value of "Yes" and an Amount feature value that is not less (greater) than 95 will cause the decision tree to output the response value of 0.4 while an entity having an AVS Matched feature value of "No" and an Amount feature value that is not less (greater) than 95 will cause the decision tree to output the response value of 0.7. [0049] The classification model may assign positive response values to terminal nodes that have a set of feature values that are more likely to be classified as "fraudulent" and negative values to terminal nodes that have a set of feature values that are more likely to be cl assified as "non-fraudulent" based on the number of entities classified as such in the training data. [0050] The classification server can determine one or more feature contribution values 420 for each of the response values (e.g., for each terminal node). The classification server can determine feature contribution values for each of the features that a particular branch is based on. For example, the far-right branch having a response value of 0.7 is based on the Amount feature, and the AVS Matched feature. Accordingly, the classification server can determine feature values for the Amount feature and the AVS Matched feature.
[0051] As discussed above, the classification server can determine the feature contribution values based on an expected feature value. The feature contribution values 420 can also be based on the particular features position within the tree and the percentage of entities within the training data that meet the conditions of the particular branch. [0052] In order to determine the expected feature values, the classification server can determine the average value of the feature across all of the entities in the training data. In this example, the classification server can determine that the average IP Score feature value is 60, the average Amount feature value is 60, and the majority of the entities have the AVS Matched feature value of "Yes." These expected feature values are shown in nodes of the decision tree 410.
[0053] The classification server can use the expected feature values to determine the feature contribution values 420 to be associated with each terminal node. The feature contribution values are also based on the percentage of entities that are expected to meet the conditions of that branch using the expected feature values. To determine the expected feature contribution value for a particular feature at a first terminal node, the classification server can identify the node in the decision tree that corresponds to that feature. Then, the classification server can select one of the branches of that node would be followed using the expected feature value for that feature. Then, the classification server can identify each terminal node that is within the selected branch. The classification server can then adjust the response values for each terminal node within the selected branch based on the percentage of entities within the training set that both meet the condition of the node in the decision tree that corresponds to the feature (e.g. , the entities that would follow the branch of the node selected using the expected feature value) and that would hit that particular terminal node. For example, if 20% of entities that would follow the selected branch would end at a particular terminal node, then the response value for that terminal node can be multiplied by 20%». The adjusted response values for each of the terminal nodes within the selected branch can be summed, and the summation of the terminal nodes within the selected branch (as adjusted) can be subtracted from the response value of the first terminal node itself.
[0054] The difference between the response value of the firs t terminal node and the summation of the adjusted response values of the terminal nodes within the branch selected by the expected feature value is the expected feature contribution value for that particular feature. As such, the expected feature contribution value for a feature indicates the amount of deviation between the feature value of the first terminal node from the expected feature value, thereby indicating the amount that the value for that feature contributed to the response value.
[0055] For example, in order to determine the feature contribution value for the AVS Matched feature in the terminal node having a response value of 0.7, the classification server may determine the difference between the current response and the expected response. As noted above, the expected AVS Matched feature value is '"Yes." However, the terminal node having a response of 0.7 is hit when the AVS Matched feature value is "No." Since the AVS Matched feature value is different than the expected value, then the AVS Matched feature is a cause of the response score being high (e.g., being 0.7). According, the feature contribution value for the AVS Matched feature for this terminal node will be greater than 0.
[0056] To compute the AVS Matched feature contribution value, the classification server can use the response value of the terminal node (e.g., 0.7), the expected feature value for AVS Matched (e.g., "'Yes") and the percentage of entities in the training data that would hit that terminal node (e.g., meet the conditions of the branch) based on the expected feature value for AVS matched (e.g., 0%). Since the expected feature value for AVS Matched is "Yes," then an expected entity would not hit the terminal node having the response of 0.7. Thus, 0% of expected entities would hit the terminal node having the response of 0.7 and 100% of expected entities would hit the terminal node having the response of 0.4 where the A VS Matched feature value is 0.1. 100% of the expected entities hit the AVS Matched "Yes" terminal node (response value 0.1) since the expected AVS Matched feature value is "Yes" and the AVS Matched condition is the last condition within this branch. [0057] The feature contribution value of AVS Matched for the terminal node having a 0.7 response value can he determined by multiplying each opposing response score by the percentage of entities expected to hit that response score and subtracting these two values from the response score for that terminal node. The classification server can use the expected (e.g., average) feature values (IP Score is 60, Amount is 60, and AVS Matched is "Yes") to determine the percentage of entities expected to hit a response score. To do so, the classification server can identify the percentage of entities within the training data have feature values that meet the feature conditions of the branch. The percentage of entities expected to hit the response score of 0.4 (AVS Matched is YES) is 100% and the percentage of entities expected to hit the response score of 0.7 (AVS Matched is NO) is 0% are shown by the dashed arrows in FIG. 4. For example, the AVS Matched feature contribution value for the terminal node having a 0.7 response value can be computed using formula (2) below.
AVS Matched Feature Contribution = 0.7 - ( 100% * 0,4 + 0% * 0.7 ) = 0.3 (2)
[0058] Accordingly, the feature contribution value of AVS Matched for the terminal node having a 0.4 (AVS Matches is YES) can be computed using formula (3) below.
AVS Matched Feature Contribution = 0.4 - ( 100% * 0.4 + 0% * 0.7 ) = 0.0 (3)
[0059] The feature contribution value for AVS Matched is 0.0 for the node having a response of 0.4 because the AVS Matched feature value is expected to be "Yes" and the percentage of entities hitting that that terminal node (response 0.4) is 100%. As such, the AVS being matched is expected. Therefore, the AVS Matched being YES does not contribute to the response score being 0.4 since the AVS is expected to match.
[0060] With respect to the Amount feature, the classification server may determine that 80% of entities that an IP Score that is greater than 30 have an Amount feature value is less than 95. Accordingly, 20% of entities that have an IP Score that is not greater (less) than 30 have an Amount feature value that is not less than 95. Accounting, the classification server can determine the Amount feature contribution values for each of the terminal nodes based on these percentages. For example, the classification server can determine the Amount feature contribution value for the terminal node having the response value of 0.7 using formula (4) below. Amount Feature Contribution = 0,7 - ( 80% * 0.2 + 20% * 0.5 ) = 0.44 (4) [0061] Accordingly, the Amount not being less than 95 contributes to the response score being 0.7, The Amount feature contribution is high because the Amount feature value is different tha expected (it is expected to be 60).
[0062] In addition, the classification server can determine the Amount feature contribution value for the terminal node having the response value of 0.4 using formula (5) below.
Amount Feature Contribution = 0,4 - ( 80% * 0.2 + 20% * 0.5 ) = 0.14 (5)
[0063] The Amount feature contribution is a positive value, indicating that the amount feature value contributed to the response value being 0.4.
[0064] The feature contribution values for the two other terminal nodes can be computed similarly. For example, the IP Score feature contribution value for the terminal node having the response value of 0.2 can be determined using formula (6) below.
IP Score Feature Contribution = 0.2 - (100% * 0,2 + 0% * 0.5 ) - 0 (6)
[0065] The Amount feature contribution value for the terminal node having the response value of 0.2 can be determined using formula (7) below. Amount Feature Contribution = 0.2 - (80% * 0.2 + 20% * 0.5 ) = -0.06 (7)
[0066] The Amount feature contribution value being negative for the response of 0.2 indicates that the Amount value negatively contributed to the response value, reducing the response value comparatively.
[0067] The IP Score feature contribution value for the terminal node having the response value of 0.5 can be determined using formula (8) below.
IP Score Feature Contribution = 0.5 - (100% * 0.2 + 0% * 0.5 ) - 0.3 (8)
[0068] The Amount feature contribution value for the terminal node having the response value of 0.5 can be determined using formula (9) below.
Amount Feature Contribution = 0.5 - (80% * 0.2 + 20% * 0.5 ) = 0.24 (9) [0069] The Amount feature contribution value being negative for the response of 0.2 indicates that the Amount value negatively contributed to the response value, reducing the response value comparatively. [0070] In order to determine reason codes for the classification of an entity, the
classification server can identify the feature contribution values that associated with each of the terminal nodes hit during the traversal of the cl assification model using the feature values of a particular entity. The feature contribution values for each feature can be summed across all of the decision trees in the classification model and a certain number of the top ranking feature contributions values can be selected to use for reason codes. For example, looking only at the decision tree 410 in FIG. 4, if the terminal node having the response value of 0.4 is hit, then the top ranking reason code for the classification score being 0.4 (e.g., the response value), is that the "Amount is not less than 95." To determine reason codes in consideration of the entire classification model, which may have the Amount feature in multiple decision trees, the Amount feature contribution values for each of the terminal nodes that are hit in the classification model can be summed together before the feature
contributions are ranked.
[0071] Accordingly, the classification server can determine the both the response of each decision tree within the classification server and the feature contribution values for each response which can be used to identify reason codes. As such, the classification model is a combined classification and reason code identification model. The combined classification and reason code identification model can determine classifiers and reason codes for the classification using only a single traversal of the tree since the feature contribution values are based on the expected feature values. This combined model provides accurate reason code while reducing computation time since the reason codes are determined from a single traversal of the model.
V. EXEMPLARY METHOD
[0072] FIG. 5 is a flow chart 500 of a method for classifying and determining reason codes, in accordance with some embodiments. This method can be performed by a classification server. At step 501 of the method, the classification server can obtain training data for a plurality of entities. As discussed above, each entity of the plurality of entities characterized by a plurality of ieatures that characterize that entity and the entity data for a particular entity can indicate feature values for each feature of that entity. This step can be performed during on offline phase.
[0073] At step 502 of the method, the classification server can obtain classification data associated with each entity of the plurality of entities in the training data. The classification data may be included with the training data in some instances. The classification data can associate a plurality of different classifiers with the plurality of entities such that each entity of the plurality of enti ties is associated with one or more of the classifiers. This step can be performed during an offline phase. [0074] At step 503 of the method, the classification server can build a classification model using the training data and the classification data. The classification model can be built using a gradient boosting machine. The classification model can include a plurality of decision trees for selecting the one or more classifiers. For example, the classification model can be an ensemble of more than a thousand decision trees. Each of the decision trees can contain a plurality of branches where each branch contains one or more conditional nodes and a terminal node. The conditional nodes can be associated with a particular feature (e.g., "Amount") and a set of feature values (e.g., "Amount < 95") for that particular feature. Each of the terminal nodes can be associated with a response value. This step can be performed during an offline phase. [0075] At step 504 of the method, the classification server can determine a response value for each terminal node of the decision trees in the classification model. The response values may be determined as part of the creation of the classification model using the gradient boosting machine process. This step can be performed during an offline phase.
[0076] At step 505 of the method, the classification server can determine expected feature values for each feature. The expected feature value for a particular feature based on the feature values of that feature for each entity of the plurality of entities. For example, the expected feature value for a particular feature can be the average value across all of the entities in the training data. This step can be performed during an offline phase.
[0077] At step 506 method, the classification server can determine feature contribution values for each terminal node in the decision trees of the classification model. The feature contribution values ca be based on the expected feature value for that feature, the response value for that terminal node, and the positioning of the feature within the decision tree. For instance, the feature contribution value for a particular feature can be based on the difference between a first response value of a first terminal node included in a first branch having a first condition based o that particular feature and a second response value of a second terminal node included in a second branch having a second condition based on that particular feature. The determining expected feature values used in calculating the expected contribution values can be based on the feature values of that feature for each entity of the plurality of entities. For example, the average feature value across all of the entities can be used as the expected feature value for a particular feature. This step can be performed during an offline phase.
[0078] As shown in FIG. 4 and discussed above, the classification server can determine the feature contribution values based on the response values associated with a particular terminal node and the expected values for each feature associated with conditional nodes of the branch within which the particular terminal node is included. The feature-contribution values that are associated with that particular terminal node can include a contribution value for each feature that is associated with the conditional nodes of the branch within which the particular terminal node is included. For example, if the branch that a terminal node is in is based on the "IP Score"' feature and the "Login Count"' feature, then that terminal node can be associated with feature contribution scores for both the "IP Score" and the "Login Count." These feature-contribution values can indicate an amount that the particular feature contri buted to the corresponding response value of that terminal node. [0079] At step 507 of the method, the classification server can receive new entity data for a new entity . The new entity data may be received from a request computer. The new entity data may indicate feature values for each feature of the entity. In some cases, the classification server may determine the features of the entity using a feature extraction process. This step can be performed during an online phase. [0080] At step 508 of the method, the classification server can traverse the classification model using the feature values for the new entity. In traversing the classification model, the classification server can select a plurality of terminal nodes based on whether the entities features meet the conditions of the branch that includes those terminal nodes. The classification server can then determine a response value for each decision tree within the classification model and identify feature contribution values that are associated with the response values (e.g., associated with the terminal nodes that are associated with that response value). This step can be performed during an online phase.
[0081] At step 509 of the method, the classification server classify the new entity based on the response values. For example, the classification server can determine one or more classifiers based on the aggregated response scores using a sigmoidal function and one or more threshold values. This step can be performed during an online phase. [0082] At step 510 of the method, the classification server can identify a reason code for each of the classifiers. The reason code may be a label indicating the features of the entity that were the greatest cause for the classification of the entity. The reason codes ca be sent to the requesting computer in some cases. Tins step can be performed during an online phase. VL EXAMPLE COMPUTER SYSTEMS
[0083] The various participants and elements described herein may operate one or more computer apparatuses to facilitate the functions described herein. Any of the elements in the above-described figures, including any servers or databases, may use any suitable number of subsystems to facilitate the functions described herein. [0084] Such subsystems or components are interconnected via a system bus. Subsystems may include a printer, keyboard, fixed disk (or other memory comprising computer readable media), monitor, which is coupled to display adapter, and others. Peripherals and input/output (I/O) devices, which couple to an I/O controller (which can be a processor or other suitable controller), can be connected to the computer system by any number of means known in the art, such as a serial port. For example, a serial port or an external interface can be used to connect the computer apparatus to a wide area network such as the Internet, a mouse input device, or a scanner. The interconnection via the system bus allows the central processor to communicate with each subsystem and to control the execution of instructions from system memory or the fixed disk, as well as the exchange of information between subsystems. The system memory and/or the fixed disk may embody a computer readable medium.
[0085] As described, the embodiments may involve implementing one or more functions, processes, operations or method steps. In some embodiments, the functions, processes, operations or method steps may be implemented as a result of the execution of a set of instructions or software code by a suitably -programmed computing device, microprocessor, data processor, or the like. The set of instructions or software code may be stored in a memory or other form of data storage element which is accessed by the computing device, microprocessor, etc. In other embodiments, the functions, processes, operations or method steps may be implemented by firmware or a dedicated processor, integrated circuit, etc. [0086] It should be understood that the present invention as described above can be implemented in the form of control logic using computer software in a modular or integrated manner. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will know and appreciate other ways and/or methods to implement the present systems and methods using hardware and a combination of hardware and software.
[0087] Any of the software components or functions described in this application, may be implemented as software code to be executed by a processor using any suitable computer language such as, for example, Java, C++ or Perl using, for example, conventional or object- oriented techniques. The software code may be stored as a series of instructions, or commands on a computer readable medium, such as a random access memory (RAM), a read only memory (ROM), a magnetic medium such as a hard-drive or a floppy disk, or an optical medium such as a CD-ROM. The computer readable medium may be any combination of such storage or transmission devices.
[0088] Such programs may also be encoded and transmitted using carrier signals adapted for transmission via wired, optical, and/or wireless networks conforming to a variety of protocols, including the Internet. As such, a computer readable medium according to an embodiment of the present invention may be created using a data signal encoded with such programs. Computer readable media encoded with the program code may be packaged with a compatible device or provided separately from other devices (e.g., via internet download). Any such computer readable medium may reside on or within a single computer product (e.g. a hard drive, a CD, or an entire computer system), and may be present on or within different computer products within a system or network. A computer system may include a monitor, printer, or other suitable display for providing any of the results mentioned herein to a user.
[0089] Any of the methods described herein may be totally or partially performed with a computer system including one or more processors, which can be configured to perform the steps. Thus, embodiments can be directed to computer systems configured to perform the steps of any of the methods described herein, potentially with different components performing a respective steps or a respective group of steps. Although presented as numbered steps, steps of methods herein can be performed at a same time or in a different order. Additionally, portions of these steps may be used with portions of other steps from other methods. Also, all or portions of a step may be optional. Additionally, any of the steps of any of the methods can be performed with modules, units, circuits, or other means for performing these steps. [0090] While certain exemplary embodiments have been described in detail and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative of and not intended to be restrictive of the broad invention, and that this invention is not to be limited to the specific arrangements and constructions shown and described, since various other modifications may occur to those with ordinary skill in the art.
[0091] As used herein, the use of "a", "an" or "the" is intended to mean "at least one", unless specifically indicated to the contrary.
[0092] All patents, patent applications, publications, and descriptions mentioned herein are incorporated by reference in tlieir entirety for all purposes. None is admitted to be prior art.

Claims

WHAT IS CLAIMED IS: 1. A method for identifying reason codes, the method comprising:
obtaining training data for a plurality of entities, each entity of the plurality of entities characterized by a plurality of features, the training data including a feature value for each feature of the plurality of features for each entity' of the plurality of entities;
obtaining classification data for the plurality' of entities, the classification data associating a plurality' of classifiers with the plurality of entities, each entity of the plurality of entities associated with one or more classifiers of the plurality of classifiers;
building a classification model using the training data and the classification data, the classification model including a plurality' of decision trees for selecting the one or more classifiers;
determining a plurality of expected values including an expected value for each feature of the plurality of features, the expected value for a particular feature based on feature values of that feature for each entity of the plurality of entities;
determining a plurality of feature-contribution values based on the classification model and the plurality of expected values;
associating the plurality of feature-contribution values with the classification model;
receiving, from a requesting computer, new entity data for a new entity, the new entity' data including a new plurality of feature values for each feature of the plurality of features;
traversing the classification model using the new plurality of feature values, the traversal of the classification model used to select one or more new cl assifiers of the new entity and one or more contributing-features;
identifying a reason code for each of the one or more contributing-features; and
sending, to the requesting computer, the one or more classifiers and reason codes. 2. The method of claim 1, wherein each decision tree of the plurality of decision trees contain a plurality of branches, each branch containing one or more conditional nodes and a terminal node, each of the one or more conditional nodes associated with a 4 particular feature and a set of feature values for the particular feature, each of the terminal
5 nodes associated with a response value, wherein the selecting of the one or more classifiers is
6 based on the response values,
1 3. The method of claim 1, wherein the determining of a feature-
2 contribution value for a particular feature is based on a difference between a first response
3 value of a first terminal node included in a first branch having a first condition based on the
4 particular feature and a second response value of a second terminal node included in a second
5 branch having a second condition based on the particular feature.
1 4. The method of claim 1, wherein the determining of the expected value
2 for each feature of the plurality of features is based on the feature val ues of that feature for
3 each en ti ty of the pi ural ity of enti ti es ,
1 5. The method of claim 1, wherein the determining of the expected value
2 for each feature of the plurality of features is based on an average of the feature value of that
3 feature across each entity of the plurality of entities.
1 6. The method of claim 1, wherein the determining of the plurality of
2 feature-contribution values includes determining one or more feature-contribution values for
3 each feature of the plurality of features.
4 7. The method of claim 6, wherein the determining of the plurality of
5 feature-contribution values includes determining one or more feature-contribution values for
6 each terminal node of each decision tree of the plurality of decision trees based on a response
7 value associated with that particular terminal node and the expected values for each feature
8 associated with conditional nodes of a branch within which the particular terminal node is
9 included, the one or more feature-contribution values for the particular terminal node
10 including a feature-contribution value for each feature associated with the conditional nodes
I I of the branch within which the particular terminal node is included.
1 8. The method of claim 1, wherein the feature-contribution value for a
2 particular feature indicates an amount the particular feature contributed to a corresponding
3 response value.
9. The method of claim 1, further comprising selecting a first plurality of terminal nodes from the plurality of decision trees based on the traversal of the classification model using the new plurality of feature values for the new entity.
10. The method of claim 9, further comprising:
identifying a first response value and one or more first feature-contribution values associated with the each terminal node of the selected first plurality of terminal nodes;
selecting one or more classifiers for the new entity based on the first response value; and
selecting one or more contributing-features for the new entity based on the one or more first feature-contribution values.
1 1. A computer system, comprising
one or more processor circuits; and
a non-transitory computer-readable storage medium coupled to the one or more processor circuits, the storage medium storing code executable by the one or more processor circuits for performing a method comprising:
obtaining training data for a plurality of entities, each entity of the plurality of entities characterized by a plurality of features, the training data including a feature value for each feature of the plurality of features for each entity of the plurality of entities;
obtaining classification data for the plurality of entities, the classification data associating a plurality of classifiers with the plurality of entities, each entity of the plurality of entities associated with one or more classifiers of the plurality of classifiers;
building a classification model using the training data and the classification data, the classification model including a plurality of decision trees for selecting the one or more classifiers;
determining a plurality of expected values including an expected value for each feature of the plurality of features, the expected value for a particular feature based on feature values of that feature for each entity of the plurality of entities;
determining a plurality of feature-contribution values based on the classification model and the plurality of expected values;
associating the plurality of feature-contribution values with the classification model; receiving, from a requesting computer, new entity data for a ne entity, the new entity data including a new plurality of feature values for each feature of the plurality of features;
traversing the classification model using the new plurality of feature values, the traversal of the classification model used to select one or more new classifiers of the new entity and one or more contributing-features;
identifying a reason code for each of the one or more contributing-features; and
sending, to the requesting computer, the one or more classifiers and reason codes.
12. The system of claim 11 , wherein each decision tree of the plurality of decision trees contain a plurality of branches, each branch containing one or more conditional nodes and a terminal node, each of the one or more conditional nodes associated with a particular feature and a set of feature values for the particular feature, each of the terminal nodes associated with a response value, wherein the selecting of the one or more classifiers is based on the response values.
13. The system of claim 1 1, wherein the determining of a feature- contribution value for a particular feature is based on a difference between a first response value of a first terminal node included in a first branch having a first condition based on the particular feature and a second response value of a second terminal node included in a second branch having a second condition based on the particular feature.
14. The system of claim 1 1 , wherein the determining of the expected value for each feature of the plurality of features is based on th e feature values of that feature for each entity of the plurality of entities.
15. The system of claim 11, wherein the determining of the expected value for each feature of the plurality of features is based on an average of the feature value of that feature across each entity of the plurality of entities.
16. The system of claim 11, wherein the determining of the plurality of feature-contribution values includes determining one or more feature-contribution values for each feature of the plurality of features.
17. The system of claim 16, wherein the determining of the plurality of feature-contribution values includes determining one or more feature-contribution values for each terminal node of each decision tree of the plurality of decision trees based on a response value associated with that particular terminal node and the expected values for each feature associated with conditional nodes of a branch within which the particular terminal node is included, the one or more feature-contribution values for the particular terminal node including a feature-contribution value for each feature associated with the conditional nodes of the branch within which the particular terminal node is included.
18. The system of claim 1 1, wherein the feature-contribution value for a particular feature indicates an amount the particular feature contributed to a corresponding response value.
19. The system of claim 11, wherein the method further comprises selecting a first plurality of terminal nodes from the plurality of decision trees based on the traversal of the classification model using the new plurality of feature values for the new entity,
20. The system of claim 19, wherein the method further comprises:
identifying a first response value and one or more first feature-contribution values associated with the each terminal node of the selected first plurality of terminal nodes;
selecting one or more classifiers for the new entity based on the first response value: and
selecting one or more contributing-features for the new entity based on the one or more first feature-contribution values.
PCT/US2018/024896 2017-04-07 2018-03-28 Identifying reason codes from gradient boosting machines WO2018187122A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
SG11201908634P SG11201908634PA (en) 2017-04-07 2018-03-28 Identifying reason codes from gradient boosting machines
EP18781480.1A EP3607475A4 (en) 2017-04-07 2018-03-28 Identifying reason codes from gradient boosting machines
CN201880021609.3A CN110462607B (en) 2017-04-07 2018-03-28 Identifying reason codes from gradient boosters

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US15/482,489 2017-04-07
US15/482,489 US10747784B2 (en) 2017-04-07 2017-04-07 Identifying reason codes from gradient boosting machines

Publications (1)

Publication Number Publication Date
WO2018187122A1 true WO2018187122A1 (en) 2018-10-11

Family

ID=63711022

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/024896 WO2018187122A1 (en) 2017-04-07 2018-03-28 Identifying reason codes from gradient boosting machines

Country Status (5)

Country Link
US (1) US10747784B2 (en)
EP (1) EP3607475A4 (en)
CN (1) CN110462607B (en)
SG (1) SG11201908634PA (en)
WO (1) WO2018187122A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11017324B2 (en) * 2017-05-17 2021-05-25 Microsoft Technology Licensing, Llc Tree ensemble explainability system
US11455493B2 (en) * 2018-05-16 2022-09-27 International Business Machines Corporation Explanations for artificial intelligence based recommendations
GB2579797B (en) * 2018-12-13 2022-11-16 Room4 Group Ltd Classification of cell nuclei
US11816541B2 (en) * 2019-02-15 2023-11-14 Zestfinance, Inc. Systems and methods for decomposition of differentiable and non-differentiable models
US20200334679A1 (en) * 2019-04-19 2020-10-22 Paypal, Inc. Tuning fraud-detection rules using machine learning
US11132687B2 (en) * 2019-10-04 2021-09-28 Visa International Service Association Method for dynamically reconfiguring machine learning models
US12067571B2 (en) * 2020-03-11 2024-08-20 Synchrony Bank Systems and methods for generating models for classifying imbalanced data
CN111538813B (en) * 2020-04-26 2023-05-16 北京锐安科技有限公司 Classification detection method, device, equipment and storage medium
US11366834B2 (en) * 2020-07-08 2022-06-21 Express Scripts Strategie Development, Inc. Systems and methods for machine-automated classification of website interactions
US12079312B2 (en) * 2020-09-30 2024-09-03 Microsoft Technology Licensing, Llc Machine learning outlier detection using weighted histogram-based outlier scoring (W-HBOS)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6269353B1 (en) * 1997-11-26 2001-07-31 Ishwar K. Sethi System for constructing decision tree classifiers using structure-driven induction
US20130132311A1 (en) * 2011-11-18 2013-05-23 Honeywell International Inc. Score fusion and training data recycling for video classification
US20150019211A1 (en) * 2013-07-12 2015-01-15 Microsoft Corportion Interactive concept editing in computer-human interactive learning
US20150036942A1 (en) * 2013-07-31 2015-02-05 Lsi Corporation Object recognition and tracking using a classifier comprising cascaded stages of multiple decision trees

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7930353B2 (en) * 2005-07-29 2011-04-19 Microsoft Corporation Trees of classifiers for detecting email spam
US20140279815A1 (en) 2013-03-14 2014-09-18 Opera Solutions, Llc System and Method for Generating Greedy Reason Codes for Computer Models
US10339465B2 (en) 2014-06-30 2019-07-02 Amazon Technologies, Inc. Optimized decision tree based models
US10230747B2 (en) * 2014-07-15 2019-03-12 Cisco Technology, Inc. Explaining network anomalies using decision trees
WO2016070096A1 (en) 2014-10-30 2016-05-06 Sas Institute Inc. Generating accurate reason codes with complex non-linear modeling and neural networks
US11093845B2 (en) 2015-05-22 2021-08-17 Fair Isaac Corporation Tree pathway analysis for signature inference
US9578049B2 (en) 2015-05-07 2017-02-21 Qualcomm Incorporated Methods and systems for using causal analysis for boosted decision stumps to identify and respond to non-benign behaviors
CN105159715B (en) * 2015-09-01 2018-07-20 南京大学 A kind of Python code change reminding method extracted based on the change of abstract syntax tree node

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6269353B1 (en) * 1997-11-26 2001-07-31 Ishwar K. Sethi System for constructing decision tree classifiers using structure-driven induction
US20130132311A1 (en) * 2011-11-18 2013-05-23 Honeywell International Inc. Score fusion and training data recycling for video classification
US20150019211A1 (en) * 2013-07-12 2015-01-15 Microsoft Corportion Interactive concept editing in computer-human interactive learning
US20150036942A1 (en) * 2013-07-31 2015-02-05 Lsi Corporation Object recognition and tracking using a classifier comprising cascaded stages of multiple decision trees

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
See also references of EP3607475A4 *
ZHIXIANG XU ET AL.: "Gradient Boosted Feature Selection", PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD '14, 30 September 2014 (2014-09-30), pages 1 - 2, XP058053709, Retrieved from the Internet <URL:http://alicezheng.org/papers> *

Also Published As

Publication number Publication date
US20180293292A1 (en) 2018-10-11
EP3607475A1 (en) 2020-02-12
US10747784B2 (en) 2020-08-18
SG11201908634PA (en) 2019-10-30
CN110462607A (en) 2019-11-15
CN110462607B (en) 2023-05-23
EP3607475A4 (en) 2020-05-27

Similar Documents

Publication Publication Date Title
US10747784B2 (en) Identifying reason codes from gradient boosting machines
CN111124840B (en) Method and device for predicting alarm in business operation and maintenance and electronic equipment
US20200401939A1 (en) Systems and methods for preparing data for use by machine learning algorithms
CN108629593B (en) Fraud transaction identification method, system and storage medium based on deep learning
JP6749468B2 (en) Modeling method and apparatus for evaluation model
CN108133330B (en) Social crowdsourcing task allocation method and system
US10824956B1 (en) System and method for price estimation of reports before execution in analytics
Stallmann et al. Towards Federated Clustering: A Federated Fuzzy $ c $-Means Algorithm (FFCM)
CN115658282A (en) Server computing power management distribution method, system, network device and storage medium
CN113379071A (en) Noise label correction method based on federal learning
CN111079930B (en) Data set quality parameter determining method and device and electronic equipment
CN115439192A (en) Medical commodity information pushing method and device, storage medium and computer equipment
CN113486225B (en) Enterprise image display method and system based on big data
Liu et al. A dempster-shafer theory based witness trustworthiness model to cope with unfair ratings in e-marketplace
Fernández-Navarro et al. Determination of relative agrarian technical efficiency by a dynamic over-sampling procedure guided by minimum sensitivity
CN113344415A (en) Deep neural network-based service distribution method, device, equipment and medium
WO2017131696A1 (en) Database server to predict sales
MacDonell et al. Alternatives to regression models for estimating software projects
WO2023035526A1 (en) Object sorting method, related device, and medium
Barhoush et al. Accelerating federated learning via modified local model update based on individual performance metric
JP7314388B2 (en) More Robust Training of Artificial Neural Networks
CN114092216A (en) Enterprise credit rating method, apparatus, computer device and storage medium
CN111814153A (en) Commercial website security maintenance method based on big data
Preedalikit et al. Joint modeling of survival and longitudinal ordered data using a semiparametric approach
JP2020144411A (en) Attribute estimation apparatus, attribute estimation method, attribute estimator learning apparatus and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18781480

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2018781480

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2018781480

Country of ref document: EP

Effective date: 20191107