WO2022081270A1 - Hierarchical machine learning model for performing & decision task and an explanation task - Google Patents

Hierarchical machine learning model for performing & decision task and an explanation task Download PDF

Info

Publication number
WO2022081270A1
WO2022081270A1 PCT/US2021/048502 US2021048502W WO2022081270A1 WO 2022081270 A1 WO2022081270 A1 WO 2022081270A1 US 2021048502 W US2021048502 W US 2021048502W WO 2022081270 A1 WO2022081270 A1 WO 2022081270A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
machine learning
learning model
decision
training
Prior art date
Application number
PCT/US2021/048502
Other languages
French (fr)
Inventor
Vladimir BALAYAN
Pedro dos Santos SALEIRO
Catarina Garcia BELÉM
Pedro Gustavo Santos Rodrigues BIZARRO
Original Assignee
Feedzai - Consultadoria E Invovação Tecnológica, S.A.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Feedzai - Consultadoria E Invovação Tecnológica, S.A. filed Critical Feedzai - Consultadoria E Invovação Tecnológica, S.A.
Priority to EP21880745.1A priority Critical patent/EP4038469A4/en
Publication of WO2022081270A1 publication Critical patent/WO2022081270A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0985Hyperparameter optimisation; Meta-learning; Learning-to-learn
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/045Explanation of inference; Explainable artificial intelligence [XAI]; Interpretable artificial intelligence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing

Definitions

  • Machine learning involves the use of algorithms that improve automatically through experience and by the use of data.
  • ML a model is built based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so.
  • ML models are able to learn and adapt by analyzing and drawing inferences from patterns in data.
  • ML has been increasingly used to aid humans in making better and faster decisions in a wide range of areas, such as financial services and healthcare.
  • Figure 1 A is a block diagram illustrating an embodiment of a machine learning model architecture for performing both a decision task and an explanation task.
  • Figure IB is a flow diagram illustrating an embodiment of a process for training a machine learning model using distant supervision.
  • Figure 2 is a diagram illustrating an embodiment of a feedback loop incorporating human teaching into a multi-task machine learning model.
  • Figure 3 is a diagram illustrating examples of approaches for training a multi-task machine learning model to perform both a decision task and an explanation task.
  • Figure 4 is a flow diagram illustrating an embodiment of a process for configuring a machine learning model to perform both a decision task and an explanation task.
  • Figure 5 a flow diagram illustrating an embodiment of a process for training a multitask machine learning model to perform an explanation task.
  • Figure 6A is a high-level block diagram of an embodiment of a machine learning based framework for learning attributes associated with datasets.
  • Figure 6B is a high-level block diagram of an embodiment of a machine learning based framework for identifying data attributes.
  • Figure 7 is a functional diagram illustrating a programmed computer system.
  • the invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor.
  • these implementations, or any other form that the invention may take, may be referred to as techniques.
  • the order of the steps of disclosed processes may be altered within the scope of the invention.
  • a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task.
  • the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
  • a multi-task hierarchical machine learning model is configured to perform both a decision task to predict a decision result and an explanation task to predict a plurality of semantic concepts for explainability associated with the decision task, wherein a semantic layer of the machine learning model associated with the explanation task is utilized as an input to a subsequent decision layer of the machine learning model associated with the decision task Training data is received.
  • the multi-task hierarchical machine learning model is trained using the received training data.
  • a framework based on a machine learning model that jointly learns a decision task and associated domain knowledge explanations (also referred to as a self-explaining machine learning model) is disclosed.
  • This framework is tailored to human-in-the-loop domain experts that lack deep technical ML knowledge.
  • the domain knowledge explanations are also referred to herein as semantic concepts, concepts, etc. These explanations / concepts can guide human domain experts’ reasoning throughout their decision-making process.
  • the framework utilizes decision-makers’ feedback associated with semantic concepts.
  • a weakly supervised or semisupervised method that leverages legacy rule-based systems to automatically create multi-label training data is used.
  • a machine learning architecture that jointly learns a decision task and associated explanations is disclosed.
  • a hierarchical architecture guarantees that a decision is only calculated based on a semantic layer, which is advantageous for addressing the problem of ensuring that explanations are faithful (e.g., when using a surrogate model).
  • Encoding ML interpretability architecturally also promotes the incorporation of additional domain knowledge when building the semantic explanations.
  • a neural network model is used.
  • a multi-label framework is utilized, which allows each data instance to be simultaneously associated with a multitude of concepts. For instance, in a medical diagnosis task, an example of multiple concepts association would be to associate the concepts “headache” and “high body temperature” to the prediction of the disease “flu”.
  • the techniques disclosed herein solve the problems of concept label scarcity and jointly learning an explainability task and a decision task.
  • this is accomplished by generating (in a substantially automated manner with minimal human supervision, which is referred to herein as distant supervision, weakly supervised learning, semi-supervised learning, etc.) a large dataset of labels using specified rules and concepts (referred to herein as noisy labels) and combining the large dataset of noisy labels with a small dataset of human expert manually created labels (referred to herein as golden labels).
  • a self-explainable machine learning model may be trained using a mixture of noisy labels from distant supervision and golden labels from manual annotations.
  • FIG. 1 A is a block diagram illustrating an embodiment of a machine learning model architecture for performing both a decision task and an explanation task.
  • neural network 100 receives input X 102 and outputs decision 122 related to a decision task.
  • Neural network 100 also generates concepts 110, which correspond to an explanation task of producing semantic concepts 112, 114, 116, ..., 118 used as explanations associated with the decision task.
  • concepts 110 is part of the output of neural network 100 (along with decision 122).
  • concepts 110 are the input to decision layer 120 of neural network 100.
  • Neural network 100 is configured to jointly learn to perform a decision task and provide associated domain knowledge explanations. Semantic concepts (used as explanations) help domain experts (end-users) with reasoning related to their decision-making process. As described in further detail below (e.g., see Figure 2), domain experts may provide feedback about which concepts justify their decisions. Thus, the techniques disclosed herein allow for continuously improving both predictive accuracy and explainability.
  • neural network 100 comprises a neural network (NN). This is merely illustrative and not restrictive. The techniques disclosed herein can also be implemented with different (or an ensemble of) machine learning models.
  • a multi-labeling ensemble model followed by a decision task model with the multi-labeling predictions as the only inputs to the decision task model may be utilized.
  • Utilizing both semantic concepts and decision labels in a machine learning model can be framed as finding a hypothesis (learner), h E H, such that, for the same inputs, x E X, h is able to simultaneously satisfy h : X -> Y and h : X -> S, where S is the set of semantic concepts, and Y is the set of decisions (or classes) of the decision task.
  • the decision task is also referred to herein as the predictive task.
  • the explanation task is also referred to herein as the explainability task.
  • neural network 100 comprises three building blocks: (1) neural network (NN) layers (hidden layer- 1 104 to hidden layer-L 106), (2) a semantic layer (explainability layer 108), and (3) a decision layer (decision layer 120).
  • neural network 100 is a hierarchical machine learning model in that the blocks are chained sequentially. Stated alternatively, outputs of an L-layer NN are fed as inputs to a semantic layer whose outputs are in turn fed into a decision layer. Both the decision task and the explainability task share parameters of the initial layers (the hidden layers) but also have specialized output layers for each individual task. The hierarchy shown in the output layers exploits the explainability task carrying pertinent information to the decision layer that is not explicit in the input data.
  • input X 102 is a vector X of numerical values.
  • X may comprise various values associated with a transaction to be determined (decided) as either fraudulent or not fraudulent (e.g., purchase amount for the transaction, total purchase amounts for other transactions by a same purchaser in a specified period of time, time between recent purchases, etc.).
  • Non-numerical features may be converted to numerical values and included in input X 102. For example, whether a billing address associated with the transaction matches a known billing address on file can be represented as 0 for no and 1 for yes.
  • each layer of neural network 100 is a structure that takes information from a previous layer and/or passes information to a next layer.
  • Various types of neural network layers may be used, such as fully-connected layers with rectified linear unit (ReLU) or other activation functions.
  • ReLU rectified linear unit
  • concepts 110 which can be written as S are also provided by neural network 100 as outputs of explainability layer 108.
  • an example of decision 122 is an output that includes a score between 0.0 and 1.0, which can then result in a 0 or 1 output based on a score threshold. This can be interpreted as a yes or no determination as to whether a particular transaction is likely to be fraudulent.
  • concepts 110 are comprised of a plurality of semantic concept predictions Si 112, 2 114, S3 116, ... Sk 118.
  • each semantic concept prediction may be a score between 0.0 and 1.0 representing a probability of a specific fraud concept being present, such as suspicious billing address, suspicious customer, suspicious payment, suspicious items, high speed ordering, suspicious email, suspicious Internet Protocol (IP) address, and so forth.
  • Predictive scores can result in yes or no determinations based on score thresholds.
  • each yes or no determination is based on whether a corresponding likelihood score exceeds a specified threshold (e.g., 0.5 on a scale of 0 to 1).
  • a specified threshold e.g., 0.5 on a scale of 0 to 1.
  • the example illustrated is a multi-task machine learning model because in addition to predicting a decision result (e.g., a determination that fraud exists / is likely), semantic concepts associated with explaining the decision result (e.g., suspicious billing address, suspicious customer, suspicious payment, etc., to explain why fraud is likely).
  • Figure IB is a flow diagram illustrating an embodiment of a process for training a machine learning model using distant supervision.
  • the process of Figure IB is utilized to train neural network 100 of Figure 1A, machine learning model 204 of Figure 2, and/or machine learning model 320 of Figure 3.
  • the process of Figure IB is performed by computer system 700 of Figure 7.
  • expert rules and a concepts taxonomy are received.
  • the expert rules and the concepts taxonomy form a rule-concept mapping framework to automatically associate rules to concepts of the concepts taxonomy.
  • a human specialist e.g., domain expert
  • a human specialist also reviews the rules.
  • the rules are applied to features of input data.
  • mappings between rules and concepts are created.
  • these mappings are devised by one or more human specialists (e.g., domain experts).
  • an example rule-to-concept mapping may be mapping the rule “user has used N different credit cards last week” to the concept “suspicious customer”.
  • the concept of “suspicious customer” can be part of a suitable explanation by a human expert as to why a transaction may be fraudulent when the human expert is performing the task of fraud detection.
  • the concepts taxonomy is formed by a plurality of concepts that cover different cues, signals, reasons, etc. associated with explaining a prediction of a predictive task.
  • the rules are applied to an unlabeled dataset to determine concepts labels. Due to the rule-to-concept mappings in place, applying the rules to unlabeled data generates concept labels for the unlabeled data. Stated alternatively, specified data patterns trigger the rules, whose correspondingly linked concepts can be attached to the data patterns as labels. In various embodiments, such an approach is utilized to generate concept labels for machine learning model training data to overcome the concept label scarcity problem.
  • FIG. 2 is a diagram illustrating an embodiment of a feedback loop incorporating human teaching into a multi-task machine learning model.
  • feedback loop 200 includes machine learning model 204 and expert review 212.
  • machine learning model 204 is comprised of neural network 100 of Figure 1 A.
  • machine learning model 204 receives input 202.
  • input 202 is input X 102 of Figure 1 A.
  • Machine learning model 204 produces outputs 206, which include decision task output 208 and explanation task output 210.
  • decision task output 208 corresponds to decision 122 of Figure 1 A and explanation task output 210 corresponds to concepts 110 of Figure 1 A.
  • outputs 206 are reviewed by one or more humans at expert review 212.
  • Expert review 212 generates human feedback 214, which is fed back to machine learning model 204 to train machine learning model 204. Examples of human feedback 214 are described below.
  • An advantage of feedback loop 200 is that machine learning model 204 is able to promptly adapt to human teaching (or tuning), as opposed to a uni-directional ML system that directly influences human decisions but does not allow for the reverse of adapting to human behavior. Oftentimes, uni-directional systems are offline and it is only after a certain period of time that a new model is trained and adapted to collected knowledge. Such limitations are solved by incorporating a human-teaching stage that continuously integrates expert feedback into the learning process.
  • machine learning model 204 may be configured to perform a fraud detection task in which a main goal is to classify financial transactions as being fraudulent or not.
  • input 202 would include features associated with financial transactions pertinent to the fraud detection task.
  • machine learning model 204 may infer predictive scores for both a fraud label (decision task output 208) and semantic concepts associated with fraud patterns (explanation task output 210).
  • a fraud analyst e.g., a domain expert
  • expert review 212 includes a programmed computer system (e.g., computer system 700 of Figure 7) that the domain expert utilizes to perform expert review.
  • examples of human feedback 214 include checks on whether ML model determinations (e.g., yes or no determinations) and/or associated prediction scores (e.g., on a scale from 0 to 1) associated with semantic concepts such as suspicious billing address, suspicious customer, suspicious payment, suspicious items, high speed ordering, suspicious email, suspicious IP address, etc. are accurate.
  • ML model determinations e.g., yes or no determinations
  • associated prediction scores e.g., on a scale from 0 to 1
  • semantic concepts such as suspicious billing address, suspicious customer, suspicious payment, suspicious items, high speed ordering, suspicious email, suspicious IP address, etc.
  • the domain expert may select “accurate” or “not accurate” in a user interface.
  • machine learning model 204 employs a hierarchical structure (e.g., with a semantic layer also acting as embeddings for a decision layer, such as is shown in neural network 100 of Figure 1 A) that is likely to encode additional information based on this feedback, which has a benefit of rapidly improving both predictive accuracy and also quality of explanations.
  • human feedback 214 is collected for a plurality of data instances (e.g., transactions) and then fed back to machine learning model 204 for model training. State alternatively, a batch training mode may be employed.
  • machine learning model 204 prior to utilizing feedback loop 200, is trained to perform the explanation task using a bootstrapping technique (also referred to herein as distant supervision, a weakly supervised technique, semi-supervised technique, etc.) that uses an initial concept-based annotated dataset.
  • a bootstrapping technique also referred to herein as distant supervision, a weakly supervised technique, semi-supervised technique, etc.
  • hyperparameters of machine learning model 204 may be tuned and the resulting model is then deployed in a human teaching stage via feedback loop 200 in which machine learning model 204 outputs decisions and explanations and collects human feedback regarding the outputted decisions and explanations.
  • parameters of machine learning model 204 are updated through backpropagation.
  • quality control is incorporated into expert review 212.
  • each human expert may be required to meet a minimum accuracy level (or other relevant quantitative measure).
  • different experts may be utilized to review different semantic concepts (e.g., experts assigned based on their different areas of expertise). Feedback from specific experts (e.g., with higher accuracy levels or other relevant quantitative measures) may be assigned more weight (e.g., more training weight for higher impact on training of machine learning model 204).
  • concept-based annotated datasets are created based at least in part on utilizing an automated rules-based system.
  • a distant supervision technique is utilized to derive an initial concept-based annotated dataset based on mapping rules to concepts in a taxonomy of concepts.
  • a taxonomy of concepts also referred to as a concept taxonomy
  • human effort is required to create the mappings. This amount of effort, though, is negligible when compared with the effort of manually annotating a large dataset.
  • rule-concept mappings to automatically associate rules to concepts in a fraud taxonomy can be constructed. After validation of the mappings by a fraud expert, the mappings can be utilized to automatically label payment transactions in bulk. For example, consider a payment transaction X for which the legacy system triggers two rules, rule A and rule B. Suppose that according to a predefined rule-concept mapping, rule A maps to a “suspicious email” concept and rule B maps to “suspicious IP”, “suspicious customer”, and “suspicious payment” concepts.
  • transaction X is annotated with “suspicious email”, “suspicious IP”, “suspicious customer”, and “suspicious payment”.
  • Other rules may be triggered for other transactions.
  • An example of rule A (associated with a suspicious email) is an email that exceeds a specified length, has a suspicious IP domain, or has another feature that indicates the email is suspicious.
  • Many fraud prevention systems include various legacy system rules that can be applied to fraud datasets and leveraged for rule-concept mappings. Alternatively, rules may be created specifically to derive concept labels and applied to fraud datasets to generate concept labels for transaction data instances.
  • the techniques disclosed herein solve the problem of machine learning models (e.g., neural networks) requiring large amounts of training data (particularly for multi-label performance) that are difficult to collect and/or create for explanation tasks.
  • machine learning models e.g., neural networks
  • manual creation of concept-annotated datasets based on semantic concepts is not feasible in many practical settings.
  • Concept-based explainability can be challenging due to a lack of annotations and/or mechanisms to collect them.
  • Small labeling campaigns oftentimes prove to be insufficient (e.g., too small, poor concepts coverage, etc.) for training machine learning models.
  • the creation of human (golden) labeled datasets is an arduous and expensive task irrespective of the application domain. As a consequence, many Al practitioners can only afford a small fraction of manually-curated data.
  • Figure 3 is a diagram illustrating examples of approaches for training a multi-task machine learning model to perform both a decision task and an explanation task.
  • golden labels 302 and/or noisy labels 304 are used according to one of a plurality of training strategies 306 to train machine learning model 320 to perform the explanation task.
  • machine learning model 320 is neural network 100 of Figure 1A and/or machine learning model 204 of Figure 2.
  • machine learning model 320 is configured to perform a detection task as well as the explanation task.
  • Golden labels 302 and noisy labels 304 are utilized to train machine learning model 320 to perform an explanation task (e.g., generate semantic concepts associated with fraud detection) as well as a decision task.
  • the training may be an initial training, pre-training, retraining, fine-tuning, etc.
  • golden labels 302 and noisy labels 304 may be fraud concept annotations for purchase transactions (e.g., suspicious billing address, suspicious customer, suspicious payment, suspicious items, high speed ordering, suspicious email, suspicious IP address, etc.).
  • training instances for machine learning model 320 can include for each transaction of a plurality of transactions, a label as to whether a transaction is fraudulent and a plurality of labels (either golden labels 302 or noisy labels 304) as to whether the transaction matches one or more of a plurality of fraud concepts.
  • at least a portion of golden labels 302 is derived from human feedback 214 of Figure 2. It is also possible for at least a portion of golden labels 302 to be derived from expert review that is not associated with running a machine learning model in inference mode (e.g., before the machine learning model is deployed).
  • Golden labels 302 refer to concept labels that are manually created by humans (also referred to as ground truth labels), which are presumed to be more accurate than noisy labels 304, which refer to concept labels that are at least in part automatically created.
  • golden labels 302 are created including by requesting fraud experts to evaluate fraud patterns (or legitimate transaction patterns) perceived for a plurality of transactions by selecting concepts from a pool of concepts determined in a fraud taxonomy.
  • fraud experts are also utilized to create the fraud taxonomy.
  • concepts refer to patterns involving specific information about transactions.
  • the concept “suspicious billing shipping” aims to guide a human’s attention to information associated with shipping or/and billing information and prompt the human to examine dubious aspects, such as a mismatch between addresses, malformed addresses, etc.
  • domain experts create a concept taxonomy comprising semantic/ontological concepts that help describe patterns that contribute to an end decision.
  • a distant supervision technique is utilized to automatically create noisy labels 304.
  • already existing information of a legacy rule system that encoded high-level domain information
  • domain specialists human experts
  • this mapping With respect to fraud detection, the result is a multi-label dataset in which each transaction instance is jointly associated with a fraud label (decision task) and fraud patterns (semantic concepts). Given that these annotations are proxies of ground truth associated concepts, they are referred to as “noisy labels”.
  • Distant supervision is utilized to heuristically extract imprecise proxy annotations for the concepts.
  • mappings of rules to concepts are validated by domain experts for correctness.
  • an example of a mapping is the rule “user tried N different credit cards last week” to the concept “suspicious customer” and/or the concept “suspicious payment”.
  • a single rule may be linked with more than one concept, as illustrated in the above example.
  • Distant supervision allows for bulk annotation of large (pre-existing) data volumes, thus allowing for fast creation of multi- label datasets. Despite still requiring human effort to create these associations, the total human effort is negligible when compared with the effort for manual annotation of the same volume of data.
  • approaches 310 and 312 are two-stage bootstrap approaches to training machine learning model 320 to jointly learn an explainability task and a decision task.
  • training is separated into two sequential stages: stage 314 (a pre-training stage) and stage 316 (a fine-tuning stage).
  • Stage 314 comprises training a base model using noisy labels 304, which are abundant due to how they are generated (automatically) but are less precise than manually generated golden labels 302.
  • Stage 316 comprises fine-tuning the base model with either just golden labels 302 (approach 310) or a mixture of golden labels 302 and noisy labels 304 (approach 312).
  • approaches 310 and 312 involve learning a model’s parameters on a related dataset (the noisy dataset) and using it to obtain a better performing model on a smaller target dataset (the at least in part human-labeled dataset).
  • initial layers of machine learning model 320 are frozen and only task-specific layers are adjusted during stage 316. This can aid in preventing performance decay associated with discarding previous information and unlearning the decision task that machine learning model 320 is configured to perform, which may occur if golden labels 302 and noisy labels 304 are drawn from different distributions. Performance decay may also occur from using a learning rate value that causes steep updates, or iterating for many epochs, which can be too aggressive and cause machine learning model 320 to unlearn the traditional decision task.
  • stage 316 occurs after machine learning model 320 is deployed (e.g., subsequent to collecting golden labels via feedback loop 200 of Figure 2). It is also possible for stage 316 to occur before machine learning model 320 is deployed (e.g., when golden labels are collected before machine learning model 320 is deployed). In various embodiments, stage 314 occurs before machine learning model 320 is deployed.
  • approach 318 is another approach that utilizes both golden labels 302 and noisy labels 304.
  • Approach 318 (also referred to herein as a hybrid approach) involves a single training stage using mixed batches of labels, partly golden and partly noisy.
  • potential advantages over a two-stage approach include reduced bias in the base model and gradient updates that tend to be more informative and less prone to capturing noise.
  • approach 318 is employed before machine learning model 320 is deployed.
  • the approaches shown are illustrative and not restrictive. Various modifications are possible. For example, it is possible to perform fine-tuning after approach 318 is employed. It is also possible to perform any number of re-training sessions for machine learning model 320.
  • an example of a training dataset is a dataset with millions of payment transactions of which a small percentage (e.g., 2-3%) are fraudulent and each transaction includes purchase information (e.g., number of items, shipping address, etc.), a fraud decision label, as well as information about triggered rules.
  • purchase information e.g., number of items, shipping address, etc.
  • a fraud decision label e.g., number of items, shipping address, etc.
  • information about triggered rules e.g., a distant supervision technique can be applied to obtain noisy explainability labels (e.g., noisy labels 304).
  • noisy labels e.g., noisy labels 304
  • a portion of the noisy labels are filtered out (not used) based on experimental results on how well the noisy labels match ones produced by humans.
  • a much smaller subset of the dataset may have human-annotated labels for the explainability task (e.g., golden labels 302).
  • golden labels are reviewed for human error.
  • all labels for the fraud decision task are golden and can be described as golden decision labels, whereas the explainability task spans both a high-resources noisy explainability dataset and a low-resources golden explainability dataset. Both golden labels and noisy labels can be utilized for training, validation, and testing for the explainability task.
  • a first hyperparameter grid search is executed in which various hyperparameters, e.g., batch size, learning rate, number and dimension of hidden layers, value of a in Equation 5 (controlling the importance of the explainability task relative to the decision task), etc., are varied and resulting models are evaluated. Models are evaluated in terms of their predictive performance at the traditional decision task and the explainability task.
  • the decision task may be evaluated according to fraud recall (rate of detecting fraud when fraud exists).
  • the explainability task may be evaluated according to a mean Average Precision (mAP) metric, which focuses on the number of correctly predicted concepts without imposing restrictions on the explanation size (how many concepts each explanation should have).
  • mAP mean Average Precision
  • the first hyperparameter grid search is applicable to approach 308, approach 318, and stage 314 of approaches 310 and 312.
  • a second hyperparameter grid search is executed during stage 316 for approaches 310 and 312.
  • the number of epochs, batch size, number of frozen layers, and learning rate are varied. Additionally, each minibatch may be enforced to have at least one transaction per concept and the fraction of fraudulent transactions per batch may be fixed to be equal to the fraud prevalence of the training dataset.
  • Figure 4 is a flow diagram illustrating an embodiment of a process for configuring a machine learning model to perform both a decision task and an explanation task.
  • the process of Figure 4 is performed by computer system 700 of Figure 7.
  • the machine learning model configured is neural network 100 of Figure 1 A, machine learning model 204 of Figure 2, and/or machine learning model 320 of Figure 3.
  • configuring the machine learning model includes determining an architecture of the machine learning model, e.g., determining a number of hidden layers for a NN and determining connections between the layers, e.g., connections between hidden layers, an explainability layer, and a decision layer.
  • the machine learning model is multi-task because it is configured to perform both the decision task and the explanation task.
  • neural network 100 of Figure 1A includes an explainability layer that outputs semantic concepts and a decision layer that outputs decisions.
  • Neural network 100 of Figure is also hierarchical because components of neural network 100 are chained sequentially. In particular, in neural network 100, outputs of an L-layer NN are fed as inputs to a semantic layer whose outputs are in turn fed into a decision layer.
  • training data is received.
  • the training data is labeled data in which labels for the decision task are manually generated by humans (golden decision labels).
  • the training data includes a plurality of transactions (e.g. , purchases) for which features of each transaction, e.g., purchase information such as number of items purchased, shipping address, etc. are received by the machine learning model as inputs as well as a fraud decision label for each transaction.
  • each transaction is known a priori to be either fraudulent or non-fraudulent and is labeled as such by a human in order to train the machine learning model to correctly make fraud decisions based on the inputs.
  • the training data also includes labels for semantic concepts associated with each training instance.
  • each purchase transaction may be associated with various fraud concepts (e.g., suspicious billing address, suspicious customer, suspicious payment, etc.) and labeled as to whether these fraud concepts are true or false for each purchase transaction.
  • at least a portion of the semantic concept labels in the training data are generated automatically (e.g., based on specified rules that map features of each training instance to concepts).
  • at least a portion of the semantic concept labels in the training data are generated manually by humans, though automatically generated semantic concept labels typically greatly exceed manually generated semantic concept labels.
  • at least a portion of the semantic concept labels are generated via expert review of outputs of the machine learning model (e.g., at expert review 212 of Figure 2) to be fed back to the machine learning model for training.
  • the multi-task hierarchical machine learning model is trained using the received training data.
  • the machine learning model is trained to perform the decision task based on golden decision labels and the machine learning model is trained to perform the explanation task based on a combination of noisy and golden semantic concept labels. It is also possible to train based solely on golden semantic concept labels, though it is typically very costly to obtain a sufficient quantity of golden semantic concept labels for effective training.
  • the training updates an already deployed machine learning model (e.g., human feedback 214 of Figure 2, which is fed back to machine learning model 204 of Figure 2).
  • the training is performed before the machine learning model is deployed in inference mode. Examples of training include the approaches of strategies 306 of Figure 3.
  • Figure 5 a flow diagram illustrating an embodiment of a process for training a multitask machine learning model to perform an explanation task.
  • the process of Figure 5 is performed by computer system 700 of Figure 7.
  • the multi-task machine learning model is neural network 100 of Figure 1 A, machine learning model 204 of Figure 2, and/or machine learning model 320 of Figure 3.
  • at least a portion of the process of Figure 5 is performed in 406 of Figure 4.
  • a labeling function associated with generating one or more semantic concepts is received.
  • the labeling function can be a rule(s) mapping or any other heuristic or technique to automatically label concepts.
  • the labeling function is a mapping.
  • the mapping comprises one or more rules that transform data patterns to explanations in the form of high-level concepts that are more easily understood by humans.
  • An example of a mapping is the rule that if a purchaser associated with a transaction has used a specified number N different credit cards over a specified period of time (e.g., one week), the concept “suspicious customer” is identified for the transaction.
  • a single mapping may be linked with more than one concept. For example, the pattern of a purchaser having used a specified number N different credit cards over a specified period of time can also trigger identification of the concept “suspicious payment”.
  • the received labeling function is used to automatically annotate an existing dataset with the one or more semantic concepts to generate an annotated noisy dataset.
  • the existing dataset is already labeled with decision task outputs.
  • a fraud detection dataset may include millions of payment transactions for which each transaction includes purchase information (e.g., number of items, shipping address, etc.) and a fraud decision label (e.g., fraudulent or not fraudulent).
  • the fraud detection dataset can be leveraged to obtain millions of semantic concepts labeled instances by applying the received labeling function to the purchase transaction data of the fraud detection dataset (e.g., apply rules to already existing information about purchases in the fraud detection dataset).
  • a reference dataset annotated with the one or more semantic concepts is received.
  • the reference dataset is annotated manually by human experts.
  • the reference dataset is comprised of golden semantic concept labels.
  • the reference dataset is much smaller than the annotated noisy dataset because it is significantly more time-consuming and resource-intensive to manually annotate semantic concepts as opposed to automatic annotation.
  • An advantage of the reference dataset over the annotated noisy dataset is that the labels of the reference dataset are more precise and accurate due to the more resource-intensive human expert manual labeling process.
  • a training dataset is prepared including by combining at least a portion of the reference dataset with at least a portion of the annotated noisy dataset.
  • the training dataset is comprised of a plurality of sections.
  • a first section may be comprised of at least a portion of the annotated noisy dataset, corresponding to a first training stage using noisy labels (e.g., stage 314 of Figure 3)
  • a second section may be comprised of at least a portion of the reference dataset or a combination of at least a portion of the reference dataset and at least a portion of the annotated noisy dataset, corresponding to a second training stage using noisy labels or a combination of noisy labels and golden labels (e.g., stage 316 of Figure 3).
  • the training dataset does not have a plurality of sections.
  • a section that is comprised of at least a portion of the reference dataset and at least a portion of the annotated noisy dataset can correspond to a hybrid training approach in which noisy labels and golden labels are combined (e.g., interleaved) in a single training stage (e.g., corresponding to approach 318 of Figure 3).
  • the training dataset is used to train a multi-task machine learning model configured to perform both a decision task to predict a decision result and an explanation task to predict a plurality of semantic concepts for explainability associated with the decision task.
  • the multi-task machine learning model is neural network 100 of Figure 1 A, machine learning model 204 of Figure 2, and/or machine learning model 320 of Figure 3.
  • the decision result is (for each purchase transaction) whether the purchase transaction is fraudulent or non-fraudulent and the plurality of semantic concepts provide human- interpretable reasons for the decision result.
  • Figure 6A is a high-level block diagram of an embodiment of a machine learning based framework for learning attributes associated with datasets.
  • framework 600 is utilized to train a machine learning model to perform a decision task and/or an explanation task.
  • the decision task may be to predict whether transactions are fraudulent or non- fraudulent and the explanation task may be to provide reasons explaining the fraud predictions in the form of high-level concepts that are more easily understood by humans.
  • datasets 604 comprise collections of purchase transaction data (e.g., for each transaction: number of items purchased, shipping address, amount spent, etc.). These datasets are populated and categorized via labeling 602. For example, transactions that are fraudulent may be manually grouped and labeled by domain experts, as are transactions that are non-fraudulent. Fraud concepts may be grouped and labeled manually and/or automatically.
  • Datasets 604 are tagged with comprehensive sets of labels or metadata.
  • a set of labels defined and/or selected for purchase transactions of a prescribed dataset may include one or more high-level labels that provide classification of the purchase transactions and may furthermore include lower-level labels comprising ground truth data associated with fraud status and semantic concepts in a fraud taxonomy.
  • Datasets 604 are utilized for artificial intelligence learning. Training 606 performed on datasets 604, for example, using any combination of one or more appropriate machine learning techniques such as deep neural networks and convolutional neural networks, results in a set of one or more learned attributes 608. Such attributes may be derived or inferred from labels of datasets 604. For example, a learned attribute may be that transactions associated with certain IP addresses are likely to be fraudulent.
  • framework 600 may be utilized with respect to a plurality of different training datasets. After training on large sets of data to learn various attributes, framework 650 of Figure 6B may subsequently be deployed to detect similar attributes or combinations thereof in other datasets for which such attributes are unknown.
  • FIG. 6B is a high-level block diagram of an embodiment of a machine learning based framework for identifying data attributes.
  • framework 650 is utilized to detect fraud in purchase transactions (e.g., online transactions and/or transactions in which a credit card is used).
  • Framework 650 operates on new data 652.
  • New data 652 may comprise a plurality of purchase transactions.
  • New data 652 is not labeled or tagged, e.g., with ground truth data.
  • New data 652 is processed by machine learning framework 654 to determine identified attributes 656.
  • machine learning framework 654 is trained on large labeled datasets comprising a substantial subset of, if not all, possible permutations of objects of a constrained set of possible objects associated with purchase transactions in order to learn associated attributes and combinations thereof and which may subsequently be employed to detect or identify such attributes in other collections of purchase transaction.
  • identified attributes 656 include identified semantic concepts explaining fraud status predictions.
  • Figure 7 is a functional diagram illustrating a programmed computer system.
  • the processes of Figure 4 and/or Figure 5 are executed by computer system 700.
  • neural network 100 of Figure 1 A, machine learning model 204 of Figure 2, and/or machine learning model 320 of Figure 3 is configured and/or trained using computer system 700.
  • Computer system 700 includes various subsystems as described below.
  • Computer system 700 includes at least one microprocessor subsystem (also referred to as a processor or a central processing unit (CPU)) 702.
  • Computer system 700 can be physical or virtual (e.g., a virtual machine).
  • processor 702 can be implemented by a single-chip processor or by multiple processors.
  • processor 702 is a general- purpose digital processor that controls the operation of computer system 700. Using instructions retrieved from memory 710, processor 702 controls the reception and manipulation of input data, and the output and display of data on output devices (e.g., display 718).
  • Processor 702 is coupled bi-directionally with memory 710, which can include a first primary storage, typically a random-access memory (RAM), and a second primary storage area, typically a read-only memory (ROM).
  • primary storage can be used as a general storage area and as scratch-pad memory, and can also be used to store input data and processed data.
  • Primary storage can also store programming instructions and data, in the form of data objects and text objects, in addition to other data and instructions for processes operating on processor 702.
  • primary storage typically includes basic operating instructions, program code, data, and objects used by the processor 702 to perform its functions (e.g., programmed instructions).
  • memory 710 can include any suitable computer- readable storage media, described below, depending on whether, for example, data access needs to be bi-directional or uni-directional.
  • processor 702 can also directly and very rapidly retrieve and store frequently needed data in a cache memory (not shown).
  • Persistent memory 712 (e.g., a removable mass storage device) provides additional data storage capacity for computer system 700, and is coupled either bi-directionally (read/write) or uni-directionally (read only) to processor 702.
  • persistent memory 712 can also include computer-readable media such as magnetic tape, flash memory, PC-CARDS, portable mass storage devices, holographic storage devices, and other storage devices.
  • a fixed mass storage 720 can also, for example, provide additional data storage capacity. The most common example of fixed mass storage 720 is a hard disk drive.
  • Persistent memory 712 and fixed mass storage 720 generally store additional programming instructions, data, and the like that typically are not in active use by the processor 702. It will be appreciated that the information retained within persistent memory 712 and fixed mass storages 720 can be incorporated, if needed, in standard fashion as part of memory 710 (e.g., RAM) as virtual memory.
  • bus 714 can also be used to provide access to other subsystems and devices. As shown, these can include a display monitor 718, a network interface 716, a keyboard 704, and a pointing device 706, as well as an auxiliary input/output device interface, a sound card, speakers, and other subsystems as needed.
  • pointing device 706 can be a mouse, stylus, track ball, or tablet, and is useful for interacting with a graphical user interface.
  • Network interface 716 allows processor 702 to be coupled to another computer, computer network, or telecommunications network using a network connection as shown.
  • processor 702 can receive information (e.g., data objects or program instructions) from another network or output information to another network in the course of performing method/process steps.
  • Information often represented as a sequence of instructions to be executed on a processor, can be received from and outputted to another network.
  • An interface card or similar device and appropriate software implemented by (e.g., executed/performed on) processor 702 can be used to connect computer system 700 to an external network and transfer data according to standard protocols. Processes can be executed on processor 702, or can be performed across a network such as the Internet, intranet networks, or local area networks, in conjunction with a remote processor that shares a portion of the processing.
  • Additional mass storage devices can also be connected to processor 702 through network interface 716.
  • auxiliary I/O device interface (not shown) can be used in conjunction with computer system 700.
  • the auxiliary VO device interface can include general and customized interfaces that allow processor 702 to send and, more typically, receive data from other devices such as microphones, touch-sensitive displays, transducer card readers, tape readers, voice or handwriting recognizers, biometrics readers, cameras, portable mass storage devices, and other computers.
  • various embodiments disclosed herein further relate to computer storage products with a computer readable medium that includes program code for performing various computer-implemented operations.
  • the computer-readable medium is any data storage device that can store data which can thereafter be read by a computer system.
  • Examples of computer-readable media include, but are not limited to, all the media mentioned above: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media such as optical disks; and specially configured hardware devices such as application-specific integrated circuits (ASICs), programmable logic devices (PLDs), and ROM and RAM devices.
  • ASICs application-specific integrated circuits
  • PLDs programmable logic devices
  • program code examples include both machine code, as produced, for example, by a compiler, or files containing higher level code (e.g., script) that can be executed using an interpreter.
  • the computer system shown in Figure 7 is but an example of a computer system suitable for use with the various embodiments disclosed herein.
  • Other computer systems suitable for such use can include additional or fewer subsystems.
  • bus 714 is illustrative of any interconnection scheme serving to link the subsystems.
  • Other computer architectures having different configurations of subsystems can also be utilized.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Business, Economics & Management (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Computer Security & Cryptography (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A multi-task hierarchical machine learning model is configured to perform both a decision task to predict a decision result and an explanation task to predict a plurality of semantic concepts for explainability associated with the decision task, wherein a semantic layer of the machine learning model associated with the explanation task is utilized as an input to a subsequent decision layer of the machine learning model associated with the decision task. Training data is received. The multi-task hierarchical machine learning model is trained using the received training data.

Description

HIERARCHICAL MACHINE LEARNING MODEL FOR PERFORMING A DECISION TASK AND AN EXPLANATION TASK
CROSS REFERENCE TO OTHER APPLICATIONS
[0001] This application claims priority to U.S. Provisional Patent Application No. 63/091,807 entitled TEACHING THE MACHINE TO EXPLAIN ITSELF USING DOMAIN KNOWLEDGE filed October 14, 2020, which is incorporated herein by reference for all purposes.
[0002] This application claims priority to U.S. Provisional Patent Application No. 63/154,557 entitled WEAKLY SUPERVISED MULTI-TASK LEARNING FOR CONCEPTBASED EXPLAINABILITY filed February 26, 2021, which is incorporated herein by reference for all purposes.
[0003] This application claims priority to Portugal Provisional Patent Application No. 117427 entitled A HIERARCHICAL MACHINE LEARNING MODEL FOR PERFORMING A DECISION TASK AND AN EXPLANATION TASK filed August 26, 2021, which is incorporated herein by reference for all purposes.
[0004] This application claims priority to European Patent Application No. 21193396.5 entitled A HIERARCHICAL MACHINE LEARNING MODEL FOR PERFORMING A DECISION TASK AND AN EXPLANATION TASK filed August 26, 2021, which is incorporated herein by reference for all purposes.
BACKGROUND OF THE INVENTION
[0005] Machine learning (ML) involves the use of algorithms that improve automatically through experience and by the use of data. In ML, a model is built based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. ML models are able to learn and adapt by analyzing and drawing inferences from patterns in data. ML has been increasingly used to aid humans in making better and faster decisions in a wide range of areas, such as financial services and healthcare. However, it is difficult for humans to comprehend the rationale behind ML models’ predictions, hindering trust in their decision-making. Thus, it would be beneficial to develop techniques directed toward making ML decisions more interpretable for humans.
BRIEF DESCRIPTION OF THE DRAWINGS [0006] Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
[0007] Figure 1 A is a block diagram illustrating an embodiment of a machine learning model architecture for performing both a decision task and an explanation task.
[0008] Figure IB is a flow diagram illustrating an embodiment of a process for training a machine learning model using distant supervision.
[0009] Figure 2 is a diagram illustrating an embodiment of a feedback loop incorporating human teaching into a multi-task machine learning model.
[0010] Figure 3 is a diagram illustrating examples of approaches for training a multi-task machine learning model to perform both a decision task and an explanation task.
[0011] Figure 4 is a flow diagram illustrating an embodiment of a process for configuring a machine learning model to perform both a decision task and an explanation task.
[0012] Figure 5 a flow diagram illustrating an embodiment of a process for training a multitask machine learning model to perform an explanation task.
[0013] Figure 6A is a high-level block diagram of an embodiment of a machine learning based framework for learning attributes associated with datasets.
[0014] Figure 6B is a high-level block diagram of an embodiment of a machine learning based framework for identifying data attributes.
[0015] Figure 7 is a functional diagram illustrating a programmed computer system.
DETAILED DESCRIPTION
[0016] The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily
Figure imgf000004_0001
configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
[0017] A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
[0018] A multi-task hierarchical machine learning model is configured to perform both a decision task to predict a decision result and an explanation task to predict a plurality of semantic concepts for explainability associated with the decision task, wherein a semantic layer of the machine learning model associated with the explanation task is utilized as an input to a subsequent decision layer of the machine learning model associated with the decision task Training data is received. The multi-task hierarchical machine learning model is trained using the received training data.
[0019] A framework based on a machine learning model that jointly learns a decision task and associated domain knowledge explanations (also referred to as a self-explaining machine learning model) is disclosed. This framework is tailored to human-in-the-loop domain experts that lack deep technical ML knowledge. The domain knowledge explanations are also referred to herein as semantic concepts, concepts, etc. These explanations / concepts can guide human domain experts’ reasoning throughout their decision-making process. In some embodiments, the framework utilizes decision-makers’ feedback associated with semantic concepts. An advantage of the framework is that both predictive accuracy and explainability can be continuously improved. Due to the high cost of manually labeling semantic concepts to train a self-explaining machine learning model, as described in further detail herein, in various embodiments, a weakly supervised or semisupervised method that leverages legacy rule-based systems to automatically create multi-label training data is used.
Figure imgf000005_0001
Figure imgf000006_0001
[0021] As described in further detail below, a machine learning architecture that jointly learns a decision task and associated explanations is disclosed. By encoding ML interpretability architecturally, more robust and authentic explanations can be achieved. A hierarchical architecture guarantees that a decision is only calculated based on a semantic layer, which is advantageous for addressing the problem of ensuring that explanations are faithful (e.g., when using a surrogate model). Encoding ML interpretability architecturally also promotes the incorporation of additional domain knowledge when building the semantic explanations. In various embodiments, due to its flexibility and generalization capabilities, a neural network model is used. In various embodiments, a multi-label framework is utilized, which allows each data instance to be simultaneously associated with a multitude of concepts. For instance, in a medical diagnosis task, an example of multiple concepts association would be to associate the concepts “headache” and “high body temperature” to the prediction of the disease “flu”.
[0022] The techniques disclosed herein solve the problems of concept label scarcity and jointly learning an explainability task and a decision task. In various embodiments, this is accomplished by generating (in a substantially automated manner with minimal human supervision, which is referred to herein as distant supervision, weakly supervised learning, semi-supervised
Figure imgf000006_0002
learning, etc.) a large dataset of labels using specified rules and concepts (referred to herein as noisy labels) and combining the large dataset of noisy labels with a small dataset of human expert manually created labels (referred to herein as golden labels). Stated alternatively, a self-explainable machine learning model may be trained using a mixture of noisy labels from distant supervision and golden labels from manual annotations.
[0023] Figure 1 A is a block diagram illustrating an embodiment of a machine learning model architecture for performing both a decision task and an explanation task. In the example illustrated, neural network 100 receives input X 102 and outputs decision 122 related to a decision task. Neural network 100 also generates concepts 110, which correspond to an explanation task of producing semantic concepts 112, 114, 116, ..., 118 used as explanations associated with the decision task. In various embodiments, concepts 110 is part of the output of neural network 100 (along with decision 122). As described in further detail below, in the example illustrated, concepts 110 are the input to decision layer 120 of neural network 100.
[0024] Neural network 100 is configured to jointly learn to perform a decision task and provide associated domain knowledge explanations. Semantic concepts (used as explanations) help domain experts (end-users) with reasoning related to their decision-making process. As described in further detail below (e.g., see Figure 2), domain experts may provide feedback about which concepts justify their decisions. Thus, the techniques disclosed herein allow for continuously improving both predictive accuracy and explainability. In some embodiments, neural network 100 comprises a neural network (NN). This is merely illustrative and not restrictive. The techniques disclosed herein can also be implemented with different (or an ensemble of) machine learning models. For example, a multi-labeling ensemble model followed by a decision task model with the multi-labeling predictions as the only inputs to the decision task model may be utilized. Utilizing both semantic concepts and decision labels in a machine learning model can be framed as finding a hypothesis (learner), h E H, such that, for the same inputs, x E X, h is able to simultaneously satisfy h : X -> Y and h : X -> S, where S is the set of semantic concepts, and Y is the set of decisions (or classes) of the decision task. The decision task is also referred to herein as the predictive task. The explanation task is also referred to herein as the explainability task.
[0025] In the example illustrated, neural network 100 comprises three building blocks: (1) neural network (NN) layers (hidden layer- 1 104 to hidden layer-L 106), (2) a semantic layer (explainability layer 108), and (3) a decision layer (decision layer 120). In the example illustrated, neural network 100 is a hierarchical machine learning model in that the blocks are chained
Figure imgf000007_0001
sequentially. Stated alternatively, outputs of an L-layer NN are fed as inputs to a semantic layer whose outputs are in turn fed into a decision layer. Both the decision task and the explainability task share parameters of the initial layers (the hidden layers) but also have specialized output layers for each individual task. The hierarchy shown in the output layers exploits the explainability task carrying pertinent information to the decision layer that is not explicit in the input data. In various embodiments, input X 102 is a vector X of numerical values. For example, with respect to fraud detection (an example application of the techniques disclosed herein that will be referred to repeatedly herein for illustrative purposes), X may comprise various values associated with a transaction to be determined (decided) as either fraudulent or not fraudulent (e.g., purchase amount for the transaction, total purchase amounts for other transactions by a same purchaser in a specified period of time, time between recent purchases, etc.). Non-numerical features may be converted to numerical values and included in input X 102. For example, whether a billing address associated with the transaction matches a known billing address on file can be represented as 0 for no and 1 for yes. It is also possible for input X 102 to include non-numerical values, such as the billing address. In various embodiments, each layer of neural network 100 (e.g., hidden layer-1 104 through hidden layer-L 106, explainability layer 108, and decision layer 120) is a structure that takes information from a previous layer and/or passes information to a next layer. Various types of neural network layers may be used, such as fully-connected layers with rectified linear unit (ReLU) or other activation functions. In various embodiments, in addition to decision 122 (which can be written as P) provided by neural network 100 as an output of decision layer 120, concepts 110 (which can be written as S) are also provided by neural network 100 as outputs of explainability layer 108.
[0026] With respect to fraud detection, an example of decision 122 is an output that includes a score between 0.0 and 1.0, which can then result in a 0 or 1 output based on a score threshold. This can be interpreted as a yes or no determination as to whether a particular transaction is likely to be fraudulent. In the example shown, concepts 110 are comprised of a plurality of semantic concept predictions Si 112, 2 114, S3 116, ... Sk 118. With respect to fraud detection, each semantic concept prediction may be a score between 0.0 and 1.0 representing a probability of a specific fraud concept being present, such as suspicious billing address, suspicious customer, suspicious payment, suspicious items, high speed ordering, suspicious email, suspicious Internet Protocol (IP) address, and so forth. Predictive scores (e.g., likelihood scores between 0.0 and 1.0) can result in yes or no determinations based on score thresholds. In some embodiments, each yes or no determination is based on whether a corresponding likelihood score exceeds a specified
Figure imgf000008_0001
threshold (e.g., 0.5 on a scale of 0 to 1). The example illustrated is a multi-task machine learning model because in addition to predicting a decision result (e.g., a determination that fraud exists / is likely), semantic concepts associated with explaining the decision result (e.g., suspicious billing address, suspicious customer, suspicious payment, etc., to explain why fraud is likely). By chaining semantic and decision layers (explainability layer 108 and decision layer 120), external information about the domain (e.g., fraud detection) which is not available in the feature data (input X 102) can be encoded. This is particularly meaningful when the taxonomy of semantic concepts is closely related to the decision task (e.g., a fraud taxonomy of fraudulent patterns can be very correlated with the fraud detection task). Therefore, learning to accurately predict the domain concepts can be very advantageous with respect to end-task predictions and end-user decisions.
Figure imgf000009_0001
[0029] Figure IB is a flow diagram illustrating an embodiment of a process for training a
Figure imgf000009_0002
machine learning model using distant supervision. In some embodiments, the process of Figure IB is utilized to train neural network 100 of Figure 1A, machine learning model 204 of Figure 2, and/or machine learning model 320 of Figure 3. In some embodiments, the process of Figure IB is performed by computer system 700 of Figure 7.
[0030] At 130, expert rules and a concepts taxonomy are received. The expert rules and the concepts taxonomy form a rule-concept mapping framework to automatically associate rules to concepts of the concepts taxonomy. In various embodiments, a human specialist (e.g., domain expert) devises the concepts taxonomy with all the relevant concepts for a specific task. These concepts closely reflect the human specialist’s reasoning process when performing the task and therefore are perceived as suitable explanations. In various embodiments, a human specialist also reviews the rules. In various embodiments, the rules are applied to features of input data.
[0031] At 132, mappings between rules and concepts are created. In various embodiments, these mappings are devised by one or more human specialists (e.g., domain experts). As a specific example, with respect to fraud detection, an example rule-to-concept mapping may be mapping the rule “user has used N different credit cards last week” to the concept “suspicious customer”. In this example mapping, the concept of “suspicious customer” can be part of a suitable explanation by a human expert as to why a transaction may be fraudulent when the human expert is performing the task of fraud detection. In various embodiments, the concepts taxonomy is formed by a plurality of concepts that cover different cues, signals, reasons, etc. associated with explaining a prediction of a predictive task.
[0032] At 134, the rules are applied to an unlabeled dataset to determine concepts labels. Due to the rule-to-concept mappings in place, applying the rules to unlabeled data generates concept labels for the unlabeled data. Stated alternatively, specified data patterns trigger the rules, whose correspondingly linked concepts can be attached to the data patterns as labels. In various embodiments, such an approach is utilized to generate concept labels for machine learning model training data to overcome the concept label scarcity problem.
[0033] Figure 2 is a diagram illustrating an embodiment of a feedback loop incorporating human teaching into a multi-task machine learning model. In the example illustrated, feedback loop 200 includes machine learning model 204 and expert review 212. In some embodiments, machine learning model 204 is comprised of neural network 100 of Figure 1 A. In the example shown, machine learning model 204 receives input 202. In some embodiments, input 202 is input X 102 of Figure 1 A. Machine learning model 204 produces outputs 206, which include decision task output
Figure imgf000010_0001
208 and explanation task output 210. In some embodiments, decision task output 208 corresponds to decision 122 of Figure 1 A and explanation task output 210 corresponds to concepts 110 of Figure 1 A. In the example illustrated, and as described in further detail below, outputs 206 are reviewed by one or more humans at expert review 212. Expert review 212 generates human feedback 214, which is fed back to machine learning model 204 to train machine learning model 204. Examples of human feedback 214 are described below.
[0034] An advantage of feedback loop 200 is that machine learning model 204 is able to promptly adapt to human teaching (or tuning), as opposed to a uni-directional ML system that directly influences human decisions but does not allow for the reverse of adapting to human behavior. Oftentimes, uni-directional systems are offline and it is only after a certain period of time that a new model is trained and adapted to collected knowledge. Such limitations are solved by incorporating a human-teaching stage that continuously integrates expert feedback into the learning process. For example, machine learning model 204 may be configured to perform a fraud detection task in which a main goal is to classify financial transactions as being fraudulent or not. Thus, input 202 would include features associated with financial transactions pertinent to the fraud detection task. After receiving transaction information via input 202, machine learning model 204 may infer predictive scores for both a fraud label (decision task output 208) and semantic concepts associated with fraud patterns (explanation task output 210). A fraud analyst (e.g., a domain expert) can review the transaction at expert review 212 and indicate whether the fraud label and semantic concepts have been correctly decided by machine learning model 204. In some embodiments, expert review 212 includes a programmed computer system (e.g., computer system 700 of Figure 7) that the domain expert utilizes to perform expert review. With respect to fraud detection, examples of human feedback 214 include checks on whether ML model determinations (e.g., yes or no determinations) and/or associated prediction scores (e.g., on a scale from 0 to 1) associated with semantic concepts such as suspicious billing address, suspicious customer, suspicious payment, suspicious items, high speed ordering, suspicious email, suspicious IP address, etc. are accurate. For example, the domain expert may select “accurate” or “not accurate” in a user interface.
[0035] In many real-world settings, human expertise aims to disambiguate inputs for which a model is uncertain. In these cases, the techniques disclosed herein can exploit this short-term feedback to improve human- Al system performance. In various embodiments, machine learning model 204 employs a hierarchical structure (e.g., with a semantic layer also acting as embeddings for a decision layer, such as is shown in neural network 100 of Figure 1 A) that is likely to encode additional information based on this feedback, which has a benefit of rapidly improving both
Figure imgf000011_0001
predictive accuracy and also quality of explanations. In some embodiments, human feedback 214 is collected for a plurality of data instances (e.g., transactions) and then fed back to machine learning model 204 for model training. State alternatively, a batch training mode may be employed.
[0036] In various embodiments, prior to utilizing feedback loop 200, machine learning model 204 is trained to perform the explanation task using a bootstrapping technique (also referred to herein as distant supervision, a weakly supervised technique, semi-supervised technique, etc.) that uses an initial concept-based annotated dataset. At this stage, hyperparameters of machine learning model 204 may be tuned and the resulting model is then deployed in a human teaching stage via feedback loop 200 in which machine learning model 204 outputs decisions and explanations and collects human feedback regarding the outputted decisions and explanations. In various embodiments, after a specified number of human feedback instances, parameters of machine learning model 204 are updated through backpropagation. In some embodiments, quality control is incorporated into expert review 212. For example, each human expert may be required to meet a minimum accuracy level (or other relevant quantitative measure). Additionally, different experts may be utilized to review different semantic concepts (e.g., experts assigned based on their different areas of expertise). Feedback from specific experts (e.g., with higher accuracy levels or other relevant quantitative measures) may be assigned more weight (e.g., more training weight for higher impact on training of machine learning model 204).
[0037] In various embodiments, concept-based annotated datasets are created based at least in part on utilizing an automated rules-based system. In various embodiments, a distant supervision technique is utilized to derive an initial concept-based annotated dataset based on mapping rules to concepts in a taxonomy of concepts. A taxonomy of concepts (also referred to as a concept taxonomy) represents cues, signals, reasons, etc. associated with a predictive task (e.g., see Figure IB above). In various embodiments, human effort is required to create the mappings. This amount of effort, though, is negligible when compared with the effort of manually annotating a large dataset. As an example, consider a fraud prevention domain involving a rule legacy system. Using similarities between the domain knowledge conveyed in the rules, rule-concept mappings to automatically associate rules to concepts in a fraud taxonomy can be constructed. After validation of the mappings by a fraud expert, the mappings can be utilized to automatically label payment transactions in bulk. For example, consider a payment transaction X for which the legacy system triggers two rules, rule A and rule B. Suppose that according to a predefined rule-concept mapping, rule A maps to a “suspicious email” concept and rule B maps to “suspicious IP”, “suspicious customer”, and “suspicious payment” concepts. Thus, by applying a distant supervision technique,
Figure imgf000012_0001
transaction X is annotated with “suspicious email”, “suspicious IP”, “suspicious customer”, and “suspicious payment”. Other rules may be triggered for other transactions. An example of rule A (associated with a suspicious email) is an email that exceeds a specified length, has a suspicious IP domain, or has another feature that indicates the email is suspicious. Many fraud prevention systems include various legacy system rules that can be applied to fraud datasets and leveraged for rule-concept mappings. Alternatively, rules may be created specifically to derive concept labels and applied to fraud datasets to generate concept labels for transaction data instances. Based on a few rule-based predictors available off-the-shelf in historical data accumulated by deployed Al models, it is possible to automatically generate concept-based annotations for datasets with many (e.g., millions of) instances. Although these annotations are likely to be imprecise (also referred to as weak, noisy, etc.) due to a lack of expert human involvement, these noisy annotations overcome the concept label scarcity problem of not having initial concept-based annotations and can be utilized to bootstrap model training and allow for subsequent fine-tuning using a small human-labeled (golden) dataset. Various training strategies incorporating the small human-labeled dataset are possible (e.g., see Figure 3).
[0038] The techniques disclosed herein solve the problem of machine learning models (e.g., neural networks) requiring large amounts of training data (particularly for multi-label performance) that are difficult to collect and/or create for explanation tasks. Stated alternatively, manual creation of concept-annotated datasets based on semantic concepts is not feasible in many practical settings. Concept-based explainability can be challenging due to a lack of annotations and/or mechanisms to collect them. Small labeling campaigns oftentimes prove to be insufficient (e.g., too small, poor concepts coverage, etc.) for training machine learning models. The creation of human (golden) labeled datasets is an arduous and expensive task irrespective of the application domain. As a consequence, many Al practitioners can only afford a small fraction of manually-curated data.
[0039] Figure 3 is a diagram illustrating examples of approaches for training a multi-task machine learning model to perform both a decision task and an explanation task. In the example illustrated, golden labels 302 and/or noisy labels 304 are used according to one of a plurality of training strategies 306 to train machine learning model 320 to perform the explanation task. In some embodiments, machine learning model 320 is neural network 100 of Figure 1A and/or machine learning model 204 of Figure 2. In various embodiments, machine learning model 320 is configured to perform a detection task as well as the explanation task.
[0040] Golden labels 302 and noisy labels 304 are utilized to train machine learning model 320 to perform an explanation task (e.g., generate semantic concepts associated with fraud
Figure imgf000013_0001
detection) as well as a decision task. The training may be an initial training, pre-training, retraining, fine-tuning, etc. With respect to fraud detection, golden labels 302 and noisy labels 304 may be fraud concept annotations for purchase transactions (e.g., suspicious billing address, suspicious customer, suspicious payment, suspicious items, high speed ordering, suspicious email, suspicious IP address, etc.). Thus, training instances for machine learning model 320 can include for each transaction of a plurality of transactions, a label as to whether a transaction is fraudulent and a plurality of labels (either golden labels 302 or noisy labels 304) as to whether the transaction matches one or more of a plurality of fraud concepts. In some embodiments, at least a portion of golden labels 302 is derived from human feedback 214 of Figure 2. It is also possible for at least a portion of golden labels 302 to be derived from expert review that is not associated with running a machine learning model in inference mode (e.g., before the machine learning model is deployed).
[0041] Golden labels 302 refer to concept labels that are manually created by humans (also referred to as ground truth labels), which are presumed to be more accurate than noisy labels 304, which refer to concept labels that are at least in part automatically created. With respect to fraud detection, in some embodiments, golden labels 302 are created including by requesting fraud experts to evaluate fraud patterns (or legitimate transaction patterns) perceived for a plurality of transactions by selecting concepts from a pool of concepts determined in a fraud taxonomy. With respect to fraud detection (a particular example), fraud experts are also utilized to create the fraud taxonomy. In the fraud taxonomy, semantically, concepts refer to patterns involving specific information about transactions. For example, the concept “suspicious billing shipping” aims to guide a human’s attention to information associated with shipping or/and billing information and prompt the human to examine dubious aspects, such as a mismatch between addresses, malformed addresses, etc. In general, domain experts create a concept taxonomy comprising semantic/ontological concepts that help describe patterns that contribute to an end decision.
[0042] In various embodiments, a distant supervision technique is utilized to automatically create noisy labels 304. In some embodiments, already existing information of a legacy rule system (that encoded high-level domain information) is extracted and mapped into the concepts (referred to as rule-concept mapping). In various embodiments, domain specialists (human experts) supervise this mapping. With respect to fraud detection, the result is a multi-label dataset in which each transaction instance is jointly associated with a fraud label (decision task) and fraud patterns (semantic concepts). Given that these annotations are proxies of ground truth associated concepts, they are referred to as “noisy labels”. Distant supervision is utilized to heuristically extract imprecise proxy annotations for the concepts. In various embodiments, mappings of rules to
Figure imgf000014_0001
concepts are validated by domain experts for correctness. With respect to fraud explainability, an example of a mapping is the rule “user tried N different credit cards last week” to the concept “suspicious customer” and/or the concept “suspicious payment”. A single rule may be linked with more than one concept, as illustrated in the above example. Distant supervision allows for bulk annotation of large (pre-existing) data volumes, thus allowing for fast creation of multi- label datasets. Despite still requiring human effort to create these associations, the total human effort is negligible when compared with the effort for manual annotation of the same volume of data.
[0043] In the example illustrated, it is possible, via approach 308, to train machine learning model 320 to perform an explainability task using only golden labels 302. However, a disadvantage of approach 308 is that it is difficult to generate enough golden labels to effectively train machine learning model 320 (e.g., resulting in poor concepts coverage). Approaches that utilize both golden labels 302 and noisy labels 304 are described below.
[0044] In the example illustrated, approaches 310 and 312 are two-stage bootstrap approaches to training machine learning model 320 to jointly learn an explainability task and a decision task. In the example shown, training is separated into two sequential stages: stage 314 (a pre-training stage) and stage 316 (a fine-tuning stage). Stage 314 comprises training a base model using noisy labels 304, which are abundant due to how they are generated (automatically) but are less precise than manually generated golden labels 302. Stage 316 comprises fine-tuning the base model with either just golden labels 302 (approach 310) or a mixture of golden labels 302 and noisy labels 304 (approach 312). Stated alternatively, approaches 310 and 312 involve learning a model’s parameters on a related dataset (the noisy dataset) and using it to obtain a better performing model on a smaller target dataset (the at least in part human-labeled dataset). In some embodiments, initial layers of machine learning model 320 are frozen and only task-specific layers are adjusted during stage 316. This can aid in preventing performance decay associated with discarding previous information and unlearning the decision task that machine learning model 320 is configured to perform, which may occur if golden labels 302 and noisy labels 304 are drawn from different distributions. Performance decay may also occur from using a learning rate value that causes steep updates, or iterating for many epochs, which can be too aggressive and cause machine learning model 320 to unlearn the traditional decision task. In some embodiments, stage 316 occurs after machine learning model 320 is deployed (e.g., subsequent to collecting golden labels via feedback loop 200 of Figure 2). It is also possible for stage 316 to occur before machine learning model 320 is deployed (e.g., when golden labels are collected before machine learning model 320 is deployed). In various embodiments, stage 314 occurs before machine learning model
Figure imgf000015_0001
320 is deployed.
[0045] In the example illustrated, approach 318 is another approach that utilizes both golden labels 302 and noisy labels 304. Approach 318 (also referred to herein as a hybrid approach) involves a single training stage using mixed batches of labels, partly golden and partly noisy. Depending on the application, potential advantages over a two-stage approach include reduced bias in the base model and gradient updates that tend to be more informative and less prone to capturing noise. In various embodiments, approach 318 is employed before machine learning model 320 is deployed. The approaches shown are illustrative and not restrictive. Various modifications are possible. For example, it is possible to perform fine-tuning after approach 318 is employed. It is also possible to perform any number of re-training sessions for machine learning model 320.
[0046] With respect to fraud detection, an example of a training dataset is a dataset with millions of payment transactions of which a small percentage (e.g., 2-3%) are fraudulent and each transaction includes purchase information (e.g., number of items, shipping address, etc.), a fraud decision label, as well as information about triggered rules. Based on the information about triggered rules, a distant supervision technique can be applied to obtain noisy explainability labels (e.g., noisy labels 304). In some embodiments, a portion of the noisy labels are filtered out (not used) based on experimental results on how well the noisy labels match ones produced by humans. With respect to fraud detection, in various embodiments, a much smaller subset of the dataset (e.g., <1% of the entire dataset) may have human-annotated labels for the explainability task (e.g., golden labels 302). In some embodiments, golden labels are reviewed for human error. Typically, all labels for the fraud decision task are golden and can be described as golden decision labels, whereas the explainability task spans both a high-resources noisy explainability dataset and a low-resources golden explainability dataset. Both golden labels and noisy labels can be utilized for training, validation, and testing for the explainability task.
[0047] In some embodiments, for training strategies 306, a first hyperparameter grid search is executed in which various hyperparameters, e.g., batch size, learning rate, number and dimension of hidden layers, value of a in Equation 5 (controlling the importance of the explainability task relative to the decision task), etc., are varied and resulting models are evaluated. Models are evaluated in terms of their predictive performance at the traditional decision task and the explainability task. With respect to fraud detection, the decision task may be evaluated according to fraud recall (rate of detecting fraud when fraud exists). The explainability task may be evaluated according to a mean Average Precision (mAP) metric, which focuses on the number of correctly predicted concepts without imposing restrictions on the explanation size (how many concepts each
Figure imgf000016_0001
explanation should have). In various embodiments, the first hyperparameter grid search is applicable to approach 308, approach 318, and stage 314 of approaches 310 and 312. In various embodiments, a second hyperparameter grid search is executed during stage 316 for approaches 310 and 312. In various embodiments, for the second hyperparameter grid search, the number of epochs, batch size, number of frozen layers, and learning rate are varied. Additionally, each minibatch may be enforced to have at least one transaction per concept and the fraction of fraudulent transactions per batch may be fixed to be equal to the fraud prevalence of the training dataset.
[0048] Figure 4 is a flow diagram illustrating an embodiment of a process for configuring a machine learning model to perform both a decision task and an explanation task. In some embodiments, the process of Figure 4 is performed by computer system 700 of Figure 7. In some embodiments, the machine learning model configured is neural network 100 of Figure 1 A, machine learning model 204 of Figure 2, and/or machine learning model 320 of Figure 3.
Figure imgf000017_0001
[0050] In various embodiments, configuring the machine learning model includes determining an architecture of the machine learning model, e.g., determining a number of hidden layers for a NN and determining connections between the layers, e.g., connections between hidden layers, an explainability layer, and a decision layer. The machine learning model is multi-task because it is configured to perform both the decision task and the explanation task. For example, neural network 100 of Figure 1A includes an explainability layer that outputs semantic concepts and a decision layer that outputs decisions. Neural network 100 of Figure is also hierarchical
Figure imgf000017_0002
because components of neural network 100 are chained sequentially. In particular, in neural network 100, outputs of an L-layer NN are fed as inputs to a semantic layer whose outputs are in turn fed into a decision layer.
[0051] At 404, training data is received. In various embodiments, the training data is labeled data in which labels for the decision task are manually generated by humans (golden decision labels). For example, with respect to fraud detection, in various embodiments, the training data includes a plurality of transactions (e.g. , purchases) for which features of each transaction, e.g., purchase information such as number of items purchased, shipping address, etc. are received by the machine learning model as inputs as well as a fraud decision label for each transaction. Stated alternatively, in various embodiments, each transaction is known a priori to be either fraudulent or non-fraudulent and is labeled as such by a human in order to train the machine learning model to correctly make fraud decisions based on the inputs. In various embodiments, the training data also includes labels for semantic concepts associated with each training instance. For example, with respect to fraud detection, each purchase transaction may be associated with various fraud concepts (e.g., suspicious billing address, suspicious customer, suspicious payment, etc.) and labeled as to whether these fraud concepts are true or false for each purchase transaction. In some embodiments, at least a portion of the semantic concept labels in the training data are generated automatically (e.g., based on specified rules that map features of each training instance to concepts). In some embodiments, at least a portion of the semantic concept labels in the training data are generated manually by humans, though automatically generated semantic concept labels typically greatly exceed manually generated semantic concept labels. In some embodiments, at least a portion of the semantic concept labels are generated via expert review of outputs of the machine learning model (e.g., at expert review 212 of Figure 2) to be fed back to the machine learning model for training.
[0052] At 406, the multi-task hierarchical machine learning model is trained using the received training data. In various embodiments, the machine learning model is trained to perform the decision task based on golden decision labels and the machine learning model is trained to perform the explanation task based on a combination of noisy and golden semantic concept labels. It is also possible to train based solely on golden semantic concept labels, though it is typically very costly to obtain a sufficient quantity of golden semantic concept labels for effective training. In some embodiments, the training updates an already deployed machine learning model (e.g., human feedback 214 of Figure 2, which is fed back to machine learning model 204 of Figure 2). In some embodiments, the training is performed before the machine learning model is deployed in inference
Figure imgf000018_0001
mode. Examples of training include the approaches of strategies 306 of Figure 3.
[0053] Figure 5 a flow diagram illustrating an embodiment of a process for training a multitask machine learning model to perform an explanation task. In some embodiments, the process of Figure 5 is performed by computer system 700 of Figure 7. In some embodiments, the multi-task machine learning model is neural network 100 of Figure 1 A, machine learning model 204 of Figure 2, and/or machine learning model 320 of Figure 3. In some embodiments, at least a portion of the process of Figure 5 is performed in 406 of Figure 4.
[0054] At 502, a labeling function associated with generating one or more semantic concepts is received. The labeling function can be a rule(s) mapping or any other heuristic or technique to automatically label concepts. In some embodiments, the labeling function is a mapping. In some embodiments, the mapping comprises one or more rules that transform data patterns to explanations in the form of high-level concepts that are more easily understood by humans. An example of a mapping is the rule that if a purchaser associated with a transaction has used a specified number N different credit cards over a specified period of time (e.g., one week), the concept “suspicious customer” is identified for the transaction. A single mapping may be linked with more than one concept. For example, the pattern of a purchaser having used a specified number N different credit cards over a specified period of time can also trigger identification of the concept “suspicious payment”.
[0055] At 504, the received labeling function is used to automatically annotate an existing dataset with the one or more semantic concepts to generate an annotated noisy dataset. In various embodiments, the existing dataset is already labeled with decision task outputs. For example, a fraud detection dataset may include millions of payment transactions for which each transaction includes purchase information (e.g., number of items, shipping address, etc.) and a fraud decision label (e.g., fraudulent or not fraudulent). The fraud detection dataset can be leveraged to obtain millions of semantic concepts labeled instances by applying the received labeling function to the purchase transaction data of the fraud detection dataset (e.g., apply rules to already existing information about purchases in the fraud detection dataset). Stated alternatively, it is possible to obtain millions of explanation task labeled instances by leveraging the data from which millions of decision task labeled instances are obtained. Using the labeling function to automatically annotate the existing dataset results in noisy labels (the annotated noisy dataset) because automatic annotation via the received labeling function is typically less precise than manual annotation by human experts.
Figure imgf000019_0001
[0056] At 506, a reference dataset annotated with the one or more semantic concepts is received. In various embodiments, the reference dataset is annotated manually by human experts. Stated alternatively, in various embodiments, the reference dataset is comprised of golden semantic concept labels. Typically, the reference dataset is much smaller than the annotated noisy dataset because it is significantly more time-consuming and resource-intensive to manually annotate semantic concepts as opposed to automatic annotation. An advantage of the reference dataset over the annotated noisy dataset is that the labels of the reference dataset are more precise and accurate due to the more resource-intensive human expert manual labeling process.
[0057] At 508, a training dataset is prepared including by combining at least a portion of the reference dataset with at least a portion of the annotated noisy dataset. In some embodiments, the training dataset is comprised of a plurality of sections. For example, a first section may be comprised of at least a portion of the annotated noisy dataset, corresponding to a first training stage using noisy labels (e.g., stage 314 of Figure 3), and a second section may be comprised of at least a portion of the reference dataset or a combination of at least a portion of the reference dataset and at least a portion of the annotated noisy dataset, corresponding to a second training stage using noisy labels or a combination of noisy labels and golden labels (e.g., stage 316 of Figure 3). In alternative embodiments, the training dataset does not have a plurality of sections. For example, a section that is comprised of at least a portion of the reference dataset and at least a portion of the annotated noisy dataset can correspond to a hybrid training approach in which noisy labels and golden labels are combined (e.g., interleaved) in a single training stage (e.g., corresponding to approach 318 of Figure 3).
[0058] At 510, the training dataset is used to train a multi-task machine learning model configured to perform both a decision task to predict a decision result and an explanation task to predict a plurality of semantic concepts for explainability associated with the decision task. In some embodiments, the multi-task machine learning model is neural network 100 of Figure 1 A, machine learning model 204 of Figure 2, and/or machine learning model 320 of Figure 3. In some embodiments, the decision result is (for each purchase transaction) whether the purchase transaction is fraudulent or non-fraudulent and the plurality of semantic concepts provide human- interpretable reasons for the decision result.
[0059] The process of Figure 5 is illustrative and not restrictive. Other embodiments for training a multi-task machine learning model are also possible. Furthermore, no sequential order for 502, 504, 506, 508, and 510 is implied in the process of Figure 5. For example, 504 and 506 may occur in parallel, 506 may occur before 502 and 504, and so forth.
Figure imgf000020_0001
[0060] Figure 6A is a high-level block diagram of an embodiment of a machine learning based framework for learning attributes associated with datasets. In some embodiments, framework 600 is utilized to train a machine learning model to perform a decision task and/or an explanation task. For example, the decision task may be to predict whether transactions are fraudulent or non- fraudulent and the explanation task may be to provide reasons explaining the fraud predictions in the form of high-level concepts that are more easily understood by humans. For fraud detection, in many cases, datasets 604 comprise collections of purchase transaction data (e.g., for each transaction: number of items purchased, shipping address, amount spent, etc.). These datasets are populated and categorized via labeling 602. For example, transactions that are fraudulent may be manually grouped and labeled by domain experts, as are transactions that are non-fraudulent. Fraud concepts may be grouped and labeled manually and/or automatically.
[0061] Datasets 604 are tagged with comprehensive sets of labels or metadata. With respect to fraud detection, a set of labels defined and/or selected for purchase transactions of a prescribed dataset may include one or more high-level labels that provide classification of the purchase transactions and may furthermore include lower-level labels comprising ground truth data associated with fraud status and semantic concepts in a fraud taxonomy. Datasets 604 are utilized for artificial intelligence learning. Training 606 performed on datasets 604, for example, using any combination of one or more appropriate machine learning techniques such as deep neural networks and convolutional neural networks, results in a set of one or more learned attributes 608. Such attributes may be derived or inferred from labels of datasets 604. For example, a learned attribute may be that transactions associated with certain IP addresses are likely to be fraudulent. In various embodiments, different training models may be used to learn different attributes. Furthermore, framework 600 may be utilized with respect to a plurality of different training datasets. After training on large sets of data to learn various attributes, framework 650 of Figure 6B may subsequently be deployed to detect similar attributes or combinations thereof in other datasets for which such attributes are unknown.
[0062] Figure 6B is a high-level block diagram of an embodiment of a machine learning based framework for identifying data attributes. In some embodiments, framework 650 is utilized to detect fraud in purchase transactions (e.g., online transactions and/or transactions in which a credit card is used). Framework 650 operates on new data 652. New data 652 may comprise a plurality of purchase transactions. New data 652 is not labeled or tagged, e.g., with ground truth data. New data 652 is processed by machine learning framework 654 to determine identified attributes 656.
Figure imgf000021_0001
[0063] In many cases, machine learning framework 654 is trained on large labeled datasets comprising a substantial subset of, if not all, possible permutations of objects of a constrained set of possible objects associated with purchase transactions in order to learn associated attributes and combinations thereof and which may subsequently be employed to detect or identify such attributes in other collections of purchase transaction. In some embodiments, identified attributes 656 include identified semantic concepts explaining fraud status predictions.
[0064] Figure 7 is a functional diagram illustrating a programmed computer system. In some embodiments, the processes of Figure 4 and/or Figure 5 are executed by computer system 700. In some embodiments, neural network 100 of Figure 1 A, machine learning model 204 of Figure 2, and/or machine learning model 320 of Figure 3 is configured and/or trained using computer system 700.
[0065] In the example shown, computer system 700 includes various subsystems as described below. Computer system 700 includes at least one microprocessor subsystem (also referred to as a processor or a central processing unit (CPU)) 702. Computer system 700 can be physical or virtual (e.g., a virtual machine). For example, processor 702 can be implemented by a single-chip processor or by multiple processors. In some embodiments, processor 702 is a general- purpose digital processor that controls the operation of computer system 700. Using instructions retrieved from memory 710, processor 702 controls the reception and manipulation of input data, and the output and display of data on output devices (e.g., display 718).
[0066] Processor 702 is coupled bi-directionally with memory 710, which can include a first primary storage, typically a random-access memory (RAM), and a second primary storage area, typically a read-only memory (ROM). As is well known in the art, primary storage can be used as a general storage area and as scratch-pad memory, and can also be used to store input data and processed data. Primary storage can also store programming instructions and data, in the form of data objects and text objects, in addition to other data and instructions for processes operating on processor 702. Also, as is well known in the art, primary storage typically includes basic operating instructions, program code, data, and objects used by the processor 702 to perform its functions (e.g., programmed instructions). For example, memory 710 can include any suitable computer- readable storage media, described below, depending on whether, for example, data access needs to be bi-directional or uni-directional. For example, processor 702 can also directly and very rapidly retrieve and store frequently needed data in a cache memory (not shown).
[0067] Persistent memory 712 (e.g., a removable mass storage device) provides additional
Figure imgf000022_0001
data storage capacity for computer system 700, and is coupled either bi-directionally (read/write) or uni-directionally (read only) to processor 702. For example, persistent memory 712 can also include computer-readable media such as magnetic tape, flash memory, PC-CARDS, portable mass storage devices, holographic storage devices, and other storage devices. A fixed mass storage 720 can also, for example, provide additional data storage capacity. The most common example of fixed mass storage 720 is a hard disk drive. Persistent memory 712 and fixed mass storage 720 generally store additional programming instructions, data, and the like that typically are not in active use by the processor 702. It will be appreciated that the information retained within persistent memory 712 and fixed mass storages 720 can be incorporated, if needed, in standard fashion as part of memory 710 (e.g., RAM) as virtual memory.
[0068] In addition to providing processor 702 access to storage subsystems, bus 714 can also be used to provide access to other subsystems and devices. As shown, these can include a display monitor 718, a network interface 716, a keyboard 704, and a pointing device 706, as well as an auxiliary input/output device interface, a sound card, speakers, and other subsystems as needed. For example, pointing device 706 can be a mouse, stylus, track ball, or tablet, and is useful for interacting with a graphical user interface.
[0069] Network interface 716 allows processor 702 to be coupled to another computer, computer network, or telecommunications network using a network connection as shown. For example, through network interface 716, processor 702 can receive information (e.g., data objects or program instructions) from another network or output information to another network in the course of performing method/process steps. Information, often represented as a sequence of instructions to be executed on a processor, can be received from and outputted to another network. An interface card or similar device and appropriate software implemented by (e.g., executed/performed on) processor 702 can be used to connect computer system 700 to an external network and transfer data according to standard protocols. Processes can be executed on processor 702, or can be performed across a network such as the Internet, intranet networks, or local area networks, in conjunction with a remote processor that shares a portion of the processing.
Additional mass storage devices (not shown) can also be connected to processor 702 through network interface 716.
[0070] An auxiliary I/O device interface (not shown) can be used in conjunction with computer system 700. The auxiliary VO device interface can include general and customized interfaces that allow processor 702 to send and, more typically, receive data from other devices such as microphones, touch-sensitive displays, transducer card readers, tape readers, voice or
Figure imgf000023_0001
handwriting recognizers, biometrics readers, cameras, portable mass storage devices, and other computers.
[0071] In addition, various embodiments disclosed herein further relate to computer storage products with a computer readable medium that includes program code for performing various computer-implemented operations. The computer-readable medium is any data storage device that can store data which can thereafter be read by a computer system. Examples of computer-readable media include, but are not limited to, all the media mentioned above: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media such as optical disks; and specially configured hardware devices such as application-specific integrated circuits (ASICs), programmable logic devices (PLDs), and ROM and RAM devices.
Examples of program code include both machine code, as produced, for example, by a compiler, or files containing higher level code (e.g., script) that can be executed using an interpreter.
[0072] The computer system shown in Figure 7 is but an example of a computer system suitable for use with the various embodiments disclosed herein. Other computer systems suitable for such use can include additional or fewer subsystems. In addition, bus 714 is illustrative of any interconnection scheme serving to link the subsystems. Other computer architectures having different configurations of subsystems can also be utilized.
[0073] Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not restrictive.
Figure imgf000024_0001

Claims

1. A method, comprising: configuring a multi-task hierarchical machine learning model to perform both a decision task to predict a decision result and an explanation task to predict a plurality of semantic concepts for explainability associated with the decision task, wherein a semantic layer of the machine learning model associated with the explanation task is utilized as an input to a subsequent decision layer of the machine learning model associated with the decision task; receiving training data; and training the multi-task hierarchical machine learning model using the received training data.
2. The method of claim 1 , wherein the multi-task hierarchical machine learning model includes a plurality of hidden layers.
3. The method according to any of the previous claims, wherein the semantic layer receives one or more inputs from a hidden layer and transmits the plurality of semantic concepts to the subsequent decision layer.
4. The method according to any of the previous claims, wherein the multi-task hierarchical machine learning model includes a neural network.
5. The method according to any of the previous claims, wherein the decision task is associated with detection of a fraudulent transaction, money laundering, account takeover, or account-opening fraud.
6. The method according to any of the previous claims, wherein the decision result quantifies a likelihood of a decision label associated with a transaction.
7. The method according to any of the previous claims, wherein the explanation task is associated with providing fraud concepts that explain a specific fraud determination.
8. The method according to any of the previous claims, wherein the plurality of semantic concepts comprises words or phrases that are understood by humans.
9. The method according to any of the previous claims, wherein the plurality of semantic concepts belongs to a taxonomy of concepts representing patterns associated with fraudulent, money laundering, or non-legitimate account activity behavior.
10. The method according to any of the previous claims, wherein the training data includes information associated with a plurality of purchase transactions, including, for each purchase
23 transaction of the plurality of purchase transactions, a set of labeled purchase-related features and a labeled outcome as to whether fraud is present.
11. The method according to any of the previous claims, wherein the training data includes a plurality of training instances, including, for each training instance of the plurality of training instances, labels for the plurality of semantic concepts.
12. The method of claim 11 , wherein the labels for at least a portion of the training instances of the plurality of training instances are automatically generated.
13. The method of claim 11 , wherein the labels for at least a portion of the training instances of the plurality of training instances are provided by one or more humans.
14. The method according to any of the previous claims, wherein at least a portion of the training data is received during review of outputs of the multi-task hierarchical machine learning model after it is deployed to perform the decision task and the explanation task.
15. The method of claim 14, wherein the review of outputs of the multi-task hierarchical machine learning model after it is deployed to perform the decision task and the explanation task is conducted by a reviewer selected to meet specified qualification criteria to review the plurality of semantic concepts.
16. The method of claim 15, wherein the specified qualification criteria include meeting threshold levels of review accuracy.
17. The method according to any of the previous claims, wherein training the multi-task hierarchical machine learning model using the received training data includes utilizing a backpropagation and gradient descent technique.
18. The method according to any of the previous claims, wherein training the multi-task hierarchical machine learning model using the received training data includes minimizing a joint loss function that combines a loss function associated with the decision task and a loss function associated with the explanation task.
19. A system, comprising: one or more processors configured to: configure a multi-task hierarchical machine learning model to perform both a decision task to predict a decision result and an explanation task to predict a plurality of semantic concepts for explainability associated with the decision task, wherein a semantic layer of the machine learning model associated with the explanation task is utilized as an input to a
Figure imgf000026_0001
subsequent decision layer of the machine learning model associated with the decision task; receive training data; and train the multi-task hierarchical machine learning model using the received training data; and a memory coupled to at least one of the one or more processors and configured to provide at least one of the one or more processors with instructions.
20. A computer program product embodied in a non-transitory computer readable medium and comprising computer instructions for: configuring a multi-task hierarchical machine learning model to perform both a decision task to predict a decision result and an explanation task to predict a plurality of semantic concepts for explainability associated with the decision task, wherein a semantic layer of the machine learning model associated with the explanation task is utilized as an input to a subsequent decision layer of the machine learning model associated with the decision task; receiving training data; and training the multi-task hierarchical machine learning model using the received training data.
Figure imgf000027_0001
PCT/US2021/048502 2020-10-14 2021-08-31 Hierarchical machine learning model for performing & decision task and an explanation task WO2022081270A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP21880745.1A EP4038469A4 (en) 2020-10-14 2021-08-31 Hierarchical machine learning model for performing & decision task and an explanation task

Applications Claiming Priority (10)

Application Number Priority Date Filing Date Title
US202063091807P 2020-10-14 2020-10-14
US63/091,807 2020-10-14
US202163154557P 2021-02-26 2021-02-26
US63/154,557 2021-02-26
PT117427 2021-08-26
EP21193396 2021-08-26
EP21193396.5 2021-08-26
PT11742721 2021-08-26
US17/461,198 US11392954B2 (en) 2020-10-14 2021-08-30 Hierarchical machine learning model for performing a decision task and an explanation task
US17/461,198 2021-08-30

Publications (1)

Publication Number Publication Date
WO2022081270A1 true WO2022081270A1 (en) 2022-04-21

Family

ID=81077825

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/048502 WO2022081270A1 (en) 2020-10-14 2021-08-31 Hierarchical machine learning model for performing & decision task and an explanation task

Country Status (3)

Country Link
US (1) US11392954B2 (en)
EP (1) EP4038469A4 (en)
WO (1) WO2022081270A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11922314B1 (en) * 2018-11-30 2024-03-05 Ansys, Inc. Systems and methods for building dynamic reduced order physical models
US11544471B2 (en) * 2020-10-14 2023-01-03 Feedzai—Consultadoria e Inovação Tecnológica, S.A. Weakly supervised multi-task learning for concept-based explainability
US20220253856A1 (en) * 2021-02-11 2022-08-11 The Toronto-Dominion Bank System and method for machine learning based detection of fraud
US11481490B1 (en) 2021-04-02 2022-10-25 Sift Science, Inc. Systems and methods for automated labeling of subscriber digital event data in a machine learning-based digital threat mitigation platform
US20230298028A1 (en) * 2022-03-18 2023-09-21 Fidelity Information Services, Llc Analyzing a transaction in a payment processing system
CN117422206B (en) * 2023-12-18 2024-03-29 中国科学技术大学 Method, equipment and storage medium for improving engineering problem decision and scheduling efficiency
CN117892799B (en) * 2024-03-15 2024-06-04 中国科学技术大学 Financial intelligent analysis model training method and system with multi-level tasks as guidance

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7613663B1 (en) * 2002-09-30 2009-11-03 Michael Lamport Commons Intelligent control with hierarchal stacked neural networks
US20140052678A1 (en) * 2012-08-20 2014-02-20 InsideSales.com, Inc. Hierarchical based sequencing machine learning model
US20190164057A1 (en) * 2019-01-30 2019-05-30 Intel Corporation Mapping and quantification of influence of neural network features for explainable artificial intelligence

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB9715497D0 (en) 1997-07-22 1997-10-01 British Telecomm A telecommunications network
IL146597A0 (en) 2001-11-20 2002-08-14 Gordon Goren Method and system for creating meaningful summaries from interrelated sets of information
US6831663B2 (en) 2001-05-24 2004-12-14 Microsoft Corporation System and process for automatically explaining probabilistic predictions
US7657497B2 (en) 2006-11-07 2010-02-02 Ebay Inc. Online fraud prevention using genetic algorithm solution
US20080115213A1 (en) 2006-11-14 2008-05-15 Fmr Corp. Detecting Fraudulent Activity on a Network Using Stored Information
US10290053B2 (en) * 2009-06-12 2019-05-14 Guardian Analytics, Inc. Fraud detection and analysis
US9324022B2 (en) 2014-03-04 2016-04-26 Signal/Sense, Inc. Classifying data with deep learning neural records incrementally refined through expert input
US10628834B1 (en) 2015-06-16 2020-04-21 Palantir Technologies Inc. Fraud lead detection system for efficiently processing database-stored data and automatically generating natural language explanatory information of system results for display in interactive user interfaces
ES2844399T3 (en) 2016-02-10 2021-07-22 Feedzai Consultadoria E Inovacao Tecnologica S A Automatic detection of compromise points
US11062316B2 (en) 2017-08-14 2021-07-13 Feedzai—Consultadoria e Inovaçâo Tecnológica, S.A. Computer memory management during real-time fraudulent transaction analysis
US10140553B1 (en) * 2018-03-08 2018-11-27 Capital One Services, Llc Machine learning artificial intelligence system for identifying vehicles
US11455493B2 (en) * 2018-05-16 2022-09-27 International Business Machines Corporation Explanations for artificial intelligence based recommendations
US10977654B2 (en) 2018-06-29 2021-04-13 Paypal, Inc. Machine learning engine for fraud detection during cross-location online transaction processing
US11176320B2 (en) 2019-10-22 2021-11-16 International Business Machines Corporation Ascribing ground truth performance to annotation blocks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7613663B1 (en) * 2002-09-30 2009-11-03 Michael Lamport Commons Intelligent control with hierarchal stacked neural networks
US20140052678A1 (en) * 2012-08-20 2014-02-20 InsideSales.com, Inc. Hierarchical based sequencing machine learning model
US20190164057A1 (en) * 2019-01-30 2019-05-30 Intel Corporation Mapping and quantification of influence of neural network features for explainable artificial intelligence

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BIBAL ADRIEN; LOGNOUL MICHAEL; DE STREEL ALEXANDRE; FRéNAY BENOîT: "Legal requirements on explainability in machine learning", ARTIFICIAL INTELLIGENCE AND LAW, SPRINGER NETHERLANDS, DORDRECHT, vol. 29, no. 2, 30 July 2020 (2020-07-30), Dordrecht, pages 149 - 169, XP037454267, ISSN: 0924-8463, DOI: 10.1007/s10506-020-09270-4 *
HIND MICHAEL, WEI DENNIS, CAMPBELL MURRAY, CODELLA NOEL C F, DHURANDHAR AMIT, MOJSILOVIĆ ALEKSANDRA, KARTHIKEYAN NATESAN, RAMAMURT: "TED: Teaching AI to Explain its Decisions", AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY (AIES '19), HONOLULU, HI, USA, 27 January 2019 (2019-01-27) - 28 January 2019 (2019-01-28), Honolulu, HI, USA, pages 123 - 129, XP055933726, DOI: 10.1145/3306618.3314273 *
See also references of EP4038469A4 *

Also Published As

Publication number Publication date
US20220114595A1 (en) 2022-04-14
US11392954B2 (en) 2022-07-19
EP4038469A4 (en) 2023-11-29
EP4038469A1 (en) 2022-08-10

Similar Documents

Publication Publication Date Title
US11392954B2 (en) Hierarchical machine learning model for performing a decision task and an explanation task
US11544471B2 (en) Weakly supervised multi-task learning for concept-based explainability
US12118552B2 (en) User profiling based on transaction data associated with a user
US11538029B2 (en) Integrated machine learning and blockchain systems and methods for implementing an online platform for accelerating online transacting
Voican Credit Card Fraud Detection using Deep Learning Techniques.
Forough et al. Sequential credit card fraud detection: A joint deep neural network and probabilistic graphical model approach
CN109191276B (en) P2P network lending institution risk assessment method based on reinforcement learning
US20240346253A1 (en) Systems and methods for generating dynamic conversational responses through aggregated outputs of machine learning models
US20220083571A1 (en) Systems and methods for classifying imbalanced data
Rai et al. Fraud detection in credit card data using machine learning techniques
Kang et al. A CWGAN-GP-based multi-task learning model for consumer credit scoring
Karthika et al. Smart credit card fraud detection system based on dilated convolutional neural network with sampling technique
Haridasan et al. Arithmetic Optimization with Deep Learning Enabled Churn Prediction Model for Telecommunication Industries.
US20220366490A1 (en) Automatic decisioning over unstructured data
US20200175406A1 (en) Apparatus and methods for using bayesian program learning for efficient and reliable knowledge reasoning
Balayan et al. Teaching the machine to explain itself using domain knowledge
US11455531B2 (en) Trustworthy predictions using deep neural networks based on adversarial calibration
CN115994331A (en) Message sorting method and device based on decision tree
Khang et al. Detecting fraud transaction using ripper algorithm combines with ensemble learning model
Zhuang et al. A deep metric learning approach for weakly supervised loan default prediction
Jamil et al. Enhancing Prediction Accuracy in Gastric Cancer Using High-Confidence Machine Learning Models for Class Imbalance
Pradeepa et al. HGATT_LR: transforming review text classification with hypergraphs attention layer and logistic regression
Li et al. Integrating Social Media Data and Historical Stock Prices for Predictive Analysis: A Reinforcement Learning Approach.
Desai An Exploration of the Effectiveness of Machine Learning Algorithms for Text Classification
dos Santos Label Noise Injection Methods for Model Robustness Assessment in Fraud Detection Datasets

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021880745

Country of ref document: EP

Effective date: 20220505

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21880745

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE