CA2942325A1 - Computer generation of decision trees - Google Patents

Computer generation of decision trees Download PDF

Info

Publication number
CA2942325A1
CA2942325A1 CA2942325A CA2942325A CA2942325A1 CA 2942325 A1 CA2942325 A1 CA 2942325A1 CA 2942325 A CA2942325 A CA 2942325A CA 2942325 A CA2942325 A CA 2942325A CA 2942325 A1 CA2942325 A1 CA 2942325A1
Authority
CA
Canada
Prior art keywords
decision tree
constraint
analysis
nodes
outcome
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA2942325A
Other languages
French (fr)
Inventor
Peter Aprile
Natalie Worsfold
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
2536269 Ontario Inc
Original Assignee
2536269 Ontario Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 2536269 Ontario Inc filed Critical 2536269 Ontario Inc
Priority to CA2942325A priority Critical patent/CA2942325A1/en
Publication of CA2942325A1 publication Critical patent/CA2942325A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A user interface receives data related to nodes and values for an analysis decision tree. A constraint module operates on a constraint set. Each constraint of the constraint set limits possible outcome values for a node of the analysis decision tree. A decision tree generator generates a structure for the analysis decision tree. The decision tree generator applies the constraint set to the values of the analysis decision tree to obtain constrained values, maps the constrained values to outcome values of the nodes of the analysis decision tree, and outputs the analysis decision tree at the user interface. The constraint set can be modeled as a constrained decision tree. The constraint set can model settlement of legal issues, such as the principled settlement of tax disputes.

Description

Computer Generation of Decision Trees Field [moi] The present invention relates to computer processing, more specifically, to computer processing of decision trees.
Background
[0002] Decision trees are analytical tools used to model decisions and their possible consequences or results. A typical decision tree includes decision nodes, chance nodes, and result nodes. A decision node represents a choice that must be made to progress through the tree. A chance node represents a probabilistic event whose uncertain result affects which branch of the tree is taken. Each result node represents one of the final results of the modeled situation. By selecting and arranging these nodes, various complex situations can be modeled.
[0003] Computers are often used to generate decision trees and provide user interaction for building decision trees and analyzing the results. Computers provide the capability to model very complex situations that could not be modeled manually. However, it is often the case that compromises must be made to operate the computer efficiently and to provide cogent results to human analysts. Such compromises include manually simplifying the model so that the resulting decision tree can be provided to an analyst in a way that makes sense. For example, in a decision tree representing a legal process, such as a litigation process, it is often the case that settlement of the litigation is simplified to a single result node extending from the origin of the tree. Such a technique is taught in international patent application PCT/US2013/056894.
However, this is an unrealistic modeling compromise to keep the decision tree comprehensible.
Manual simplification using this and other techniques runs against the utility of decision trees in that manual simplification often represents neglected or unquantified decisions made outside the model.
[0004] One alternative is to avoid manual simplification and use the computer to generate a decision tree with a greater number of nodes for a more realistic picture of the situation, in a kind of brute force technique. However, as mentioned, such decision trees with this kind of increased complexity can be difficult to understand, thereby reducing the utility of using a decisions tree in the first place. Moreover, this strategy wastes resources of the computer in generating nodes and branches that are of little or no interest or, worse, represent highly unlikely or impossible branches or results.
[0005] A further, confounding problem in computerized decision trees is the selection of appropriate probabilities for chance nodes. Even a well-constructed computerized decision tree, such as tree build without manual simplification, generates poor results when its input data is of low quality. Sourcing of appropriate probabilities and other input data is often a manual exercise, and a high-quality source may not be available. Also, undesirable human subjectivity or error may creep into probability selection. For these and other reasons, the accuracy of computerized decision trees suffers.
[0006] Computers are presently incapable of performing simplifications that improve the usefulness of decision trees while also maintaining or increasing realism of the situations being modeled.
Summary
[0007] According to one aspect of the present invention, a process for computerized generation of a decision tree includes receiving input related to a plurality of values and a plurality of nodes for an analysis decision tree and obtaining a constraint set. The constraint set includes one or more constraints. Each constraint of the constraint set limits possible outcome values for a node of the analysis decision tree. The process further includes generating a structure for the analysis decision tree including the plurality of nodes, applying the constraint set to the plurality of values to obtain at least one constrained value, mapping the at least one constrained value to at least one outcome value of the plurality of nodes of the analysis decision tree, and outputting the analysis decision tree at a user interface of the computer.
[0008] According to another aspect of the present invention, a system for computerized generation of a decision tree includes a user interface configured to receive input related to a plurality of values and a plurality of nodes for an analysis decision tree.
The user interface is further configured to output the analysis decision tree. The system further includes a constraint module configured to operate on a constraint set. The constraint set includes one or more constraints. Each constraint of the constraint set limits possible outcome values for a node of the analysis decision tree. The system further includes a decision tree generator configured to generate a structure for the analysis decision tree including the plurality of nodes. The decision tree generator is further configured to apply the constraint set to the plurality of values to obtain at least one constrained value, map the at least one constrained value to at least one outcome value of the plurality of nodes of the analysis decision tree, and output the analysis decision tree at the user interface.
Brief Description of the Drawings
[0009] The drawings illustrate, by way of example only, embodiments of the present invention.
[0010] FIG. 1 is a diagram of a computer system for decision tree generation.
[0011] FIG. 2 is a diagram of an example decision tree.
[0012] FIG. 3 is a diagram of data structures for decision trees.
[0013] FIG. 4 is schematic diagram of the system.
[0014] FIG. 5 is a diagram of portions of example decision trees for a tax dispute.
[0015] FIG. 6 is a flowchart of a process for computerized generation of a decision tree.
[0016] FIG. 7 is a flowchart of a process for applying a constraint set to input values.
[0017] FIG. 8 is a listing of pseudocode.
[0018] FIG. 9 is a schematic diagram of another system for decision tree generation.
[0019] FIG. 10 is a flowchart of a process of outputting an analysis decision tree.

,
[0020] FIG. 11 is a flowchart of a process for assigning a probability to a node of an analysis decision tree.
[0021] FIG. 12 is a flowchart of a process for updating probabilities based on actual outcomes.
[0022] FIG. 13 is a diagram of a user interface.
Detailed Description
[0023] The present invention aims to solve at least one of the problems discussed above.
[0024] The present invention provides a way to construct decision trees using underlying situational constraints to eliminate impossible or highly unlikely nodes and merges equivalent nodes. Computer resources can be saved and processing efficiency can be increased by injecting constraints into the tree without the need for human intervention and while preserving the fidelity of the model to the situation being analyzed. Further, the present invention provides a framework for capturing the best available input data, such as input probabilities for event nodes. As will be discussed in further detail below, the present invention provides for computer generation of decision tree with increased efficiency and accuracy.
[0025] FIG. 1 shows a system 10 for decision tree generation and output according to the present invention.
[0026] In this embodiment, the system 10 includes one or more server computers connected to one or more user computers 12 in a local computer network 14. The system 10 is configured to communicate decision tree data with the user computers 12. In other embodiments, the system 10 is implemented at a stand-alone computer, which may or may not be connected to a network.
[0027] The system 10 can further be configured to communicate decision tree data with the user computers 16, 18 associated with other local computer networks 20, 22 connected to the system 10 via a wide-area network 24, such as the Internet.
[0028] The system 10 can further be configured to receive data from one or more remote probability-source computers 26, which may be connected to the system via the local computer network 14 or via another local computer network 28.
[0029] The local computer networks 14, 20, 22, 28 can be operated in different domains and under the control of different individuals or organizations. Generally, the system 10 can be made accessible to any user computer 12, 16, 18 or probability-source computer 26, whether on the same local computer network 14 or connected via the wide-area network 24.
[0030] In any case, each computer includes at least one processor for executing instructions, which can be configured to implement the processes, methods, and techniques discussed herein, and memory for storing such instructions and data. Computers configured to receive user input and provide output are provided with a user interface that can include one or more of a display device, touchscreen, keyboard, mouse, and similar.
[0031] In the example of resolving tax disputes, which will serve as an illustrative example throughout this disclosure, the local computer networks 20, 22 can be controlled by different law firms that subscribe to the service provided by the system 10.
Alternatively or additionally, the local computer network 12 is controlled by a law firm that operates the system 10 internally for its own purposes. Various other arrangements for use of the system 10 in legal environments are also contemplated.
[0032] In this embodiment, the system 10 includes a decision tree generator 30, a constraint module 32, and a network module 34. In other embodiments, such as using a stand-alone computer, the network module 34 is omitted. In such embodiments, functionality attributed to the network module 34, as will be discussed below, is instead implemented at a user interface of the computer.
[0033] In operation, any user computer 12, 16, 18 initiates generation of a decision tree at the system 10 using a request communicated over the respective network(s).
Decision tree data, such as values and probabilities for various nodes as well as any constraints, is communicated from the user computer 12, 16, 18 to the system 10. Probabilities may also be , received at the system 10 from one or more remote probability-source computers 26. The system 10 receives all decision tree data at the network module 34 and communicates same to the decision tree generator 30, which references the constraint module 32 and any supplied constraint or constraints and constructs the decision tree. The decision tree is then outputted to the requesting user computer 12, 16, 18 and/or any other authorized computer over the respective network(s) for display, refinement, and/or interaction.
[0034] FIG. 2 shows an example decision tree 50. Various nodes 52 are arranged in a network of branches 54 that originate from a root node 56. The types of nodes include decision nodes, event nodes, and result nodes. Decision nodes represent choices and can be represented graphically by squares. Event nodes represent probabilistic events and can be represented graphically by circles. Result nodes are the "leaves" of the tree, indicating different results from traversing the tree, and can be represented graphically by triangles. The root node 56 can be a decision node or an event node.
[0035] Each event node and decision node includes two or more outcomes.
Each outcome for an event node includes one or more properties, such as a probability and an outcome value. Each outcome for a decision node includes one or more properties, such as an outcome value. The selected outcome for a decision node can be represented by a probability of one.
Each result node represents one outcome of the tree and includes one or more properties, such as a probability and an outcome value. In the numerical example shown in FIG.
2, various event nodes have probabilities, p, and outcome values, v, for each outgoing branch.
A decision node has several outgoing branches, and each branch has an outcome value. By traversing from each result node back to the root node, each result node can have its outcome value computed by summing the outcome values along the traverse and can have its probability value computed by multiplying the probabilities along the traverse. For instance, in the numerical example shown, the resulting outcome value of -1600 (-600 - 1000) has a probability of 0.07 or 7% (0.20 * 0.35). Outcome values can represent money, such as money owing to a tax authority. Note that zero-probability result nodes can be pruned from the tree.
[0036] FIG. 3 shows data structures for storing a decision tree according to the present invention.
[0037] A data structure for a base node 60 includes a parent attribute 62, a children attribute 64, a key attribute 66, and a name attribute 68. The parent attribute 62 is a reference to the node's parent node or null if the node is the root node. The children attribute 64 is an array of references to the node's child nodes, if any. The key attribute 66 is a string that defines a unique key that represents the path to the node in the tree. The path is defined by a series of positive integers separated by period characters (".") where each integer represents the position of the child node in the parent's child node list. For example, the key "0.1.0"
represents the path: root node -> second child -> first child. The name attribute 68 is a string that contains the name of the node.
[0038] A data structure for an event node 70 extends the base node 60 data structure by including a list 72 of event outcomes 74. Each event outcome 74 is defined to have properties that include a probability 76, which is a number (e.g., a floating point number from 0 to 1) that represents the likelihood of the outcome. Further properties include one or more outcome values 78, each of which is a positive or negative number that represents a value of that outcome having occurred, and one or more outcome characteristics 79of a set of predefined characteristics. Each event outcome 74 can further include other properties that define other data, not shown, such as a label or name to provide a user with information about the outcome.
[0039] A data structure for a decision node 80 extends the base node 60 data structure by including a list 82 of decision outcomes 84. Each decision outcome 84 is defined to have properties that include one or more outcome values 86, each of which is a positive or negative number that represents a value of that outcome having been selected, and one or more outcome characteristics 88 of the set of predefined characteristics. Each decision outcome 84 can further include other properties that define other data, not shown, such as a label or name to provide a user with information about the outcome.

,
[0040] A data structure for a result node 90 extends the base node 60 data structure by including one or more properties, such as one or more outcome values 92, each of which is a positive or negative number that represents a value of the result represented by the result node 90, and one or more outcome characteristics 94 of the set of predefined characteristics.
[0041] The set of outcome characteristics contains unique identifiers for various different outcomes so as to facilitate dependency relationships. Outcome characteristics can be Boolean variables, unique strings, numbers, or similar. In the tax dispute example, outcome characteristics can designate statute barred outcomes, outcomes affecting gross negligence penalties, other outcomes that can affect outcomes of different nodes, and so on. Each outcome of each node can be assigned one or more outcome characteristics to enable dependency checks and negation of dependent outcomes.
[0042] An example of event outcome 74 properties is as follows:
[0043] probability = 0.35
[0044] taxablelncome = -40000
[0045] grossNegligencePenalties = -2000
[0046] statuteBarred = true
[0047] The above properties include a probability 76, two outcome values 78, and an outcome characteristic that indicates that the outcome 74 is statute barred.
[0048] The system is configured to provide various functions to the properties of outcomes. A merge/combine function is configured to combine properties. In the case of a number (e.g., outcome value) this function is defined as simple addition. For other properties, such as Boolean values, this operation can be configured to replace a previous value or perform some other complex operation. A compare function is configured to compare two properties to determine if they are identical or equivalent. This function is used when merging nodes. An "isEmpty" function is configured to determine if a property is empty, NULL, or otherwise has no effect. For example, for an integer value, a zero is considered empty because X + 0 = X.
[0049] A data structure for dependency relationships 96 includes any number of logical tests of outcome characteristics 98 (and/or other properties) and any number of user-defined relationships 99. A logical test of outcome characteristics 98 can be a logical expression containing one or more logical operators that associates two or more outcome characteristics and that evaluates to TRUE or FALSE. When checking for a dependency relationship, the logical test of outcome characteristics 98 can be applied to outcomes of each pair of related nodes, such as outcomes of parent-child pairs or outcomes of all nodes along the same branch. The characteristics of the outcomes being checked are applied to the logical test of outcome characteristics 98. When the logical test of outcome characteristics 98 evaluates to TRUE, a dependency relationship is determined to exist between the nodal outcomes. An evaluated result of FALSE means that no dependency relationship exists for the compared nodal outcomes. An example of a logical test of outcome types 98 for the tax dispute example, is as follows:
[0050] parent_outcome CONTAINS "gnpVacated" AND child_outcome CONTAINS
"reduceGNP"
[0051] where "gnpVacated" and "reduceGNP" are outcome characteristics indicative of the vacating of a gross negligence penalty and the reduction of a gross negligence penalty, respectively. When this test evaluates to TRUE, it is because a child node cannot have an outcome that reduces a gross negligence penalty if the parent node outcome has previously vacated the gross negligence penalty. However, when the gross negligence penalty is not vacated at the parent node outcome, the test evaluates to FALSE and the child node can thus reduce the gross negligence penalty.
[0052] The complexity of the logical tests of outcome characteristics 98 is not limited, and any number and arrangement (e.g., nesting) of logical operators (OR, AND, NOT, etc.) can be used. Moreover, logical tests of outcome characteristics 98 can be implemented as functions or any other kind of programmatic code and are not limited to logical operators.
Such a function, for example, can be configured to take two or more outcome characteristics as parameters and return TRUE or FALSE. A multitude of such logical tests of outcome characteristics 98 can be constructed and stored at the system 10 to model the constrained of principled settlement in tax disputes.
[0053] A user-defined relationship 99 can identify unique instances of outcomes by reference and force a dependency relationship for those outcomes. This may be useful for one-off situations.
[0054] The above data structures define a decision tree. These data structures can further include additional attributes or properties to store intermediate, computed, or temporary values, such as a cumulative outcome value for the path to the node, a combined probability of arriving at the node, a selected outcome for a decision node, and similar.
[0055] The data structures of FIG. 3 can be used to model any size and complexity of decision tree. Further, an outcome value within the decision tree may itself be modeled as a decision tree. That is, an outcome value may be subject to uncertainty or decision making and thus may be computed using a lower-level decision tree. Hence, a high-level decision tree can be used to model an overall process, while one or more low-level decision trees can be used to compute outcome values for nodes in the overall process. In one example, a high-level decision tree models a legal dispute, such litigation, while a low-level decision tree models settlement values for various nodes in the litigation decision tree. The low-level decision tree models issues for settlement and their respective values, which can be particularly relevant for tax disputes or other legal disputes where issues and values (such as tax adjustments) may be constrained to be settled according to established principles, which may be known as principled settlement.
[0056] FIG. 4 shows a schematic view of the system 10 for decision tree generation and output configured to generate a high-level decision tree, or analysis decision tree 100, based on one or more constraints that are set by a low-level decision tree, or constraint decision tree 102.
[0057] The system 10 includes the decision tree generator 30, the constraint module 32, and the network module 34, as previously discussed. The system 10 further includes an analysis module 104. The modules may be written in JavaScript or similar programming language and may be built on a framework such as Backbone.js.
[0058] The decision tree generator 30 is configured to implement the data structures described above and to generate decision trees based on such. The decision tree generator 30 controls flow of input data from the network module 34 (or a user interface) to the analysis module 104 and the constraint module 32. The decision tree generator 30 further controls flow of output data from the analysis module 104 and the constraint module 32 to the network module 34 (or a user interface). The decision tree generator 30 may communicate with a user-interface module containing user-interface generating code to allow capture of input data and to format output data, i.e., to render decision trees graphics. Alternatively, the decision tree generator 30 may include such user-interface generating code.
[0059] The analysis module 104 is configured to provide data and/or processing functionality to the decision tree generator 30 for generation of analysis decision trees 100 that model high-level processes, such as legal disputes that can be modelled using nodes representing consultations, motions, pleadings, hearings, judgements, and similar. An entire legal dispute, such as a tax dispute, can be modelled as a network of nodes that represent decisions and probabilistic events. In addition to operating on the structural data (e.g., parent/child relationships) for an analysis decision tree 100, the analysis module 104 is configured to operate on outcome data 110 that includes inputted probabilities 76 (see FIG. 3) and inputted and computed values 78, 86 for the nodes of the analysis decision tree 100. Some of the outcome data 110, specifically outcome values, is provided by the constraint module 32.
[0060] The constraint module 32 is configured to provide data and/or processing functionality to the decision tree generator 30 for generation of constraints for the analysis decision trees 100, where such constraints can be computed from constraint decision trees 102 modeled by the decision tree generator 30. For an analysis decision tree that represents a legal dispute, the constraint module 32 can be used to model issues and values pertaining to such issues for various nodes of the analysis decision tree. To achieve this, the constraint module 32 is configured to operate on a constraint set 112 that includes one or more constraints. A
constraint defines a dependency relationship among issues modelled by the constraint decision tree 102 and thus limits possible outcome values for results nodes of the constraint decision tree 102. This limits possible outcome values for nodes of the analysis decision tree 100 that are associated with one or more constraint-tree nodes affected by the constraint.
[0061] The decision tree generator 30 is configured to store constraint mappings 114 that map constraint sets 112 and respective constraint decision trees 102 to nodes of analysis decision trees 100. The decision tree generator 30 references a particular constraint mapping 114 to map constrained values obtained from a particular constraint set 112 to outcome values of nodes of the analysis decision tree 100.
[0062] In operation, input data 120 for an analysis decision tree 100 is received through the network module 34 (or user interface). Such input data 120 includes structural data for the tree, and can include outcome data, such as outcome values and probabilities.
The input data 120 is processed by the decision tree generator 30 and the analysis module 104 and is stored for access by the decision tree generator 30 and the analysis module 104 when generating the analysis decision tree 100. At the same time, input data 122 relevant to constraints is received through the network module 34 (or user interface). Such constraint input data 122 defines the constraint set 112 and includes one or more properties, such as one or more outcome values, outcome characteristics, and/or probabilities for one or more nodes of the constraint decision tree 102 and further includes one or more dependency relationships among nodes of the constraint decision tree 102. The decision tree generator 30 references the association of the input data 120 and the constraint input data 122 to generate a constraint mapping 114 that links the constraint set 112 to the outcome data 110. The constraint module 32 processes the constraint set 112 to obtain one or more constrained values 124 for the analysis module 104 to use when generating the analysis decision tree 100 associated to the constraint set 112 by the constraint mapping 114. The analysis module 104 uses the constrained values 124 to obtain =

outcome values when generating the analysis decision tree 100, which is outputted at 126 via the network module 34 (or user interface).
[0063] In FIG. 4, a dependency relationship is illustrated schematically as a pruned branch 140 of the constraint decision tree 102. That is, a dependency relationship between, for example, nodes 142, 144 eliminated subsequent possibilities along the branch 140. Hence, computation of the related subsequent outcomes on branch 140 was determined to be unnecessary, thereby advantageously saving processing time and resources.
[0064] Also shown in FIG. 4 is a constrained value 150 obtained from the constraint decision tree 102 being mapped to an outcome value of a node of the analysis decision tree 100. A constrained value 150 may be a specific result of the constraint decision tree 102, a combination (e.g., average) of several results of the constraint decision tree 102, or similar. It is further noted that processing time and resources are advantageously saved by determining that such mappings are not required for the pruned branch 140.
[0065] FIG. 5 shows an example in the context of a tax dispute between an individual and a taxation authority, such as a government, according to the present invention.
Only relevant portions of the decision trees are shown for sake of explanation.
[0066] The system constructs a constraint decision tree 102 for amounts in dispute, which are represented as node outcome values. Each level of the constraint decision tree 102 represents a different issue in dispute. In this example, the first level (level N-2; node 164) concerns whether rental expenses in the year 2010 should be deductible. The second level (level N-1; nodes 160, 162) concerns whether rental expenses in the year 2011 should be deductible. Following this logic, there should be four result nodes, one for each combination of outcomes at the two levels. For instance, if there is a 60% chance that rental expenses in the year 2010 will be deductible and there is a 55% chance that rental expenses in the year 2011 will be deductible, then one of the four possible outcomes is that there will be a 33% chance that both years will be deductible. The values, v, of these nodes represent adjustments to a tax assessment. However, it may be that the rules or laws followed by the taxation authority, in principle, do not allow for the 2011 rental expenses to be deductible if the 2010 were found not deductible. Hence, the respective result, which would have had a probability of 0.22 (0.55 *
0.40), is not reachable. This is an example of a dependency relationship and may be modelled as a logical test of characteristics 98 (FIG. 3) for two outcome characteristics corresponding to 2011 rental expenses being deductible and 2010 rental expenses being deductible. That is, the constraint decision tree models potential settlement values of the issues and the dependency relationships represents the constraints of principled settlement. The permissible outcomes of the nodes 160, 162 are dependent on the outcome of the preceding node 164.
Thus, instead of four result nodes, there are only three result nodes. The branch that would have occurred at 166 is omitted from the constraint decision tree 102. Further, the node 162, having only one outcome that is a result, is converted into a result node by merging the outcome of the node 162 with the outcome of the preceding node 164. A similar technique is applied for any node that has only one outcome remaining, whether the subsequent node is a result node or another kind of node. This can be achieved by removing the nodes of the branch or by setting the probabilities of outcomes of such nodes to zero.
[0067]
The values of the remaining result nodes 168 of the constraint decision tree 102 are constrained values that can be used to populate respective one or more outcome values 170 in the relevant analysis decision tree 100. For instance, the most likely (highest probability) result node can be selected and its value taken as the constrained value for use as the outcome value 170. Alternatively, the result values can be combined, such as by averaging, to obtain the constrained value. Regardless of the specific methodology used to determine the constrained value, the constrained value is inputted as an outcome value 170 at the relevant nodes in the analysis decision tree 100, as defined by a constraint mapping 172. In the example shown, the relevant node of the analysis decision tree 100 concerns a hearing to consider deductions. The individual can then use the outcome value 170 to propose an accurate principled settlement with the tax authority prior to or at the hearing, where such outcome value 170 reflects the principle that the 2011 rental expenses cannot be deductible if the 2010 rental expenses are not deductible.
[0068] FIG. 6 illustrates a process for computerized generation of a decision tree according to the present invention. The process can be performed at a single computer or at several computers connected via a computer network. The process can be implemented at any of the systems discussed herein.
[0069] At step 180, input values are received to define an analysis decision tree that is to be constructed. Input values can include probability values, outcome values, and similar, which may be suitable for direct placement on the analysis decision tree or may form one or more constraints for the analysis decision tree. In the tax dispute example, input values can include adjustment amounts, probabilities that adjustments will be vacated or not, penalty values, legal costs, and similar. Input values can be inputted via a user interface that enforces a data structure on the input values. Input can be performed via, for example, a user interface local to the computer performing the process or a user interface at computer remote from the computer performing the process.
[0070] At step 182, a constraint set is obtained. The constraint set can be determined based on the input values from step 180. Dependency relationships between possible input values can be used to establish possible constraint sets. When the actual input values are received, a matching constraint set is selected. In the tax dispute example, tax owing for a certain year may be statute barred and the same year may have deductions that are in dispute.
Hence, a constraint set can be configured to link potentially statute barred years to years with deductions in dispute in a dependency relationship, so that adjustments for such deductions are not made for a year that is actually found to be statute barred. The constraint set can be generated based on the input values and further may allow for input from a user interface to select, deselect, or modify dependency relationships.
[0071] At step 184, structure for the analysis decision tree is received. A
template analysis decision tree may be loaded, nodes and values may be inserted, deleted, or modified using a user interface, or a combination of such may occur. In the tax dispute example, the analysis decision tree can represent the overall process from notice of assessment to final judgement . .

and thus may include nodes representing consultations, motions, pleadings, hearings, judgements, and similar.
[0072] At step 186, the constraint set is applied to the input values of step 180 to obtain at least one constrained value. Various outcome values of various nodes of the analysis decision tree may be subject to the constraint set. That is, rather than merely taking outcome values for the analysis decision tree directly from the input values from step 180, the process applies the constraint set to limit outcome values of nodes based on dependency relationships. In the tax dispute example, an outcome value for a node may be limited based on potentially statute barred years for which deductions are in dispute. Such an outcome value would reflect the principle that adjustments are not made for a year that is found to be statute barred.
[0073] At step 188, constrained values are mapped to outcome values of the analysis decision tree. A constraint mapping can be referenced. A constraint mapping can be preconfigured, can be provided with or inferred from the input values, or can be obtained from a combination of such.
[0074] At step 190, the analysis decision tree is outputted at the user interface. This can be performed via, for example, a user interface local to the computer performing the process or a user interface at computer remote from the computer performing the process.
[0075] The process can be configured to be interactive, in that data such as node definitions, outcome and constraint values, and similar are configurable via a user interface, at step 192. As such, the process can repeat indefinitely on currently and newly inputted data.
[0076] FIG. 7 shows a process for applying a constraint set to input values and is an example of a process capable of achieving step 186 of the process of FIG. 6, according to the present invention. The process can be implemented at any of the systems discussed herein.
[0077] At step 200, at least one dependency relationships are identified as relevant to the analysis decision tree being generated.

,
[0078] At step 202, a constraint decision tree is generated. At least two nodes of the constraint decision tree are associated according to the dependency relationship
[0079] At step 204, at least one branch that corresponds to the at least two nodes associated by the dependency relationship is negated from the constraint decision tree. For each child node in the tree, an outcome of the child node, which has a dependency relationship with the outcome of the parent node that connects to the child node, is negated. This can be achieved by removing affected nodes from the tree, zeroing respective outcome values of affected nodes, or similar.
[0080] At step 206, result node values are computed for the constraint decision tree and at least one result node value is taken as a constrained value as an outcome value of the analysis decision tree.
[0081] With reference to back to FIG. 5, each level of the constraint decision tree 102 can represent a different issue (level 0 ... level N). Each issue is associated with at least one value, which in the case of a tax dispute can be termed an adjustment. Pseudocode for a constraint tree generation method 210 for generating such a constraint decision tree 102 is shown in FIG.
8. A list of issues is provided to the method 210 and a constraint decision tree 102 having an appropriate number of levels is generated.
[0082] In the constraint tree generation method 210, the statement for generating a node from an issue can be configured to account for a dependency relationship. A
dependency relationship method 212 can be called to negate values in the constraint decision tree 102 in a way that is equivalent to removing nodes or pruning branches. A negate function 214 can be configured to check for the existence of a dependency relationship between an outcome of a (issue) node and the relevant outcome of its parent node. The negate function 214 can be configured to consider characteristics of the outcome, specifically configured (one-off) dependency relationships, or a combination of such. The negate function 214 can be configured to evaluate any existing logical tests of outcome characteristics 98 and/or user-defined relationships 99 (FIG. 3).
[0083] FIG. 9 shows a schematic view of another system 220 for decision tree generation and output according to the present invention. The system 220 is similar to the system 10 discussed above and only differences will be discussed in detail. Like reference numerals represent like components and the above description can be referenced for further detail.
[0084] The system 220 further includes a calculation module 222, a models module 224, an accounts module 226, and a decision tree anonymizer 228.
[0085] The calculation module 222 is configured to support mathematical computations performed by the decision tree generator 30, analysis module 104, and/or constraint module 32. In the tax dispute example, the calculation module 222 is configured to store interest rates, sales tax rates, and penalty rates as provided by a suitable authority.
[0086] The models module 224 is configured to define computations performed by or required by the decision tree generator 30, analysis module 104, and/or constraint module 32.
In the tax dispute example, the models module 224 is configured to model tax return forms (e.g., Canadian Ti and 12 forms, US Forms 1040 and 1120, etc.), assessments, disputes, adjustments, sales tax computations, and similar.
[0087] The accounts module 226 is configured to restrict access to the system 220 to authorized users or computers. Any suitable technique may be used, such as identity certificates, usernames/passwords, and like credentials. Privilege levels may be implemented to limit input/output of data. In the tax dispute example, different privilege levels can be set for law firms, lawyers, clients, etc. Accounts may be associated with groups, such that different or competing firms/individuals can use the same system 230 as a cloud-based service. The accounts module 226 defines regular accounts 230 as users or computers that can access all or most of the functionality of the system 220, according to assigned privilege levels, if any. The accounts module 226 defines probability source accounts 232, which may be separate accounts from the regular accounts 230, a subset of the regular accounts 230, a privilege level of the regular accounts 230, or similar. The accounts module 226 and decision tree generator 30 are configured to limit access to the system 220 by the probability source accounts 232 to be through the decision tree anonymizer 228.
[0088] The decision tree anonymizer 228 is configured to remove identifying information (personal or corporate) from decision trees. As such, users or computers that access decision trees through the decision tree anonymizer 228 are not provided with information regarding the identity of any person or company to which the decision tree applies.
[0089] The decision tree generator 30 is configured to receive requests for probabilities from outside sources. In the tax dispute example, outside sources can include retired tax court judges, senior tax lawyers, tax consultants, and similar. The probabilities requested represent third-party expert knowledge and include likelihoods of success at hearings, probable outcome of a judgement, and similar. Requests may originate from a user or computer associated with a regular account 230, such as a lawyer seeking expert input. The decision tree generator 30 is configured to forward these requests to probability source computers 26 operated by such outside sources. Probability source computers 26 can be provided with a user interface configured to enter any requested probabilities, and such user interface may be set up to show the context of the request, such as a portion of or all of an anonymized analysis tree. The decision tree generator 30 is configured to await receipt of input of a requested probability from the respective probability source computer 26 and then insert such input into a respective outcome value of an analysis decision tree. In this way, third-party expert knowledge can be requested and received at the decision tree generator 30 for generating analysis decision tree of improved accuracy.
[0090] FIG. 10 shows a process for outputting an analysis decision tree and is an example of a process capable of achieving step 190 of the process of FIG. 6, according to the present invention. The process can be implemented at any of the systems discussed herein.
[0091] At step 240, it is determined whether outputting the analysis decision tree is triggered by a request to obtain a probability from a probability source computer 26 or whether outputting is another type of interaction with the tree, such as contemplated to be normally received from user computers 18. When the output is not for a request to obtain a probability, the analysis decision tree is transmitted, at step 242, to a user computer 18 and outputted at a user interface of the user computer 18, at step 244. User groups and privilege levels can be referenced at steps 242 and 244.
[0092] When the output is for a request to obtain a probability, the analysis decision tree is anonymized, at step 246, and at least a portion of the anonymized analysis decision tree (to give sufficient context to the probability request) is transmitted to the respective probability source computer 26, at step 248. The analysis decision tree is outputted at a user interface of the probability source computer 26, at step 250.
[0093] Step 252 checks for input of the requested probability at the user interface of the probability source computer 26 and repeats any of steps 240 ¨ 252, if necessary, while awaiting input of the probability. When the probability is entered, it is transmitted to the system, at step 254, and the process ends.
[0094] FIG. 11 shows a process for assigning a probability to a node of an analysis decision tree according to the present invention. The process can be implemented at any of the systems discussed herein.
[0095] At step 260 a probability is received. The system can receive probabilities from third parties as in, for example, step 254 of the process of FIG. 9. Alternatively or additionally, the system can receive probabilities via user interfaces of user computers directly associated with analysis decision trees.
[0096] At step 262, when several probabilities received for a particular outcome of a probabilistic event node of an analysis decision tree, these probabilities can be blended to obtain one probability. Suitable blending techniques include computing a computing a mean, mode, or median of the received probabilities, selecting one of the received probabilities, or similar. Computing a weighted mean is also contemplated. Weightings can be determined based on the identities of the providers of the probabilities, with sources having greater experience or greater success in past predictions being given higher weights.
For example, , several tax lawyers working on an analysis decision tree for a tax dispute may each input one probability for a particular outcome. A third-party retired tax court judge may also provide one probability for the same outcome. It may then be decided to use a weighted average of the received probabilities, with the probabilities provided by the senior lawyer and the judge being weighted higher than the probabilities provided by the junior lawyers.
[0097] At step 264, the probability is assigned to the relevant outcome of the node of the analysis decision tree under consideration. Computations for dependent nodes and outcomes can be triggered by this step to account for updates to the assigned probability.
[0098] Step 266 determines whether an additional probability is to be received and blended, and the process can be repeated to accumulate additional probabilities. Suspect probabilities, such as outlier values, may be discarded other otherwise excluded from the blended probability assigned to the outcome.
[0099] FIG. 12 shows a process for updating probabilities based on actual outcomes. The process can be implemented at any of the systems discussed herein.
[00100] At step 270, for an outcome of a particular probabilistic event node of a particular analysis decision tree, an indication of the actual outcome of the probabilistic event is received.
For example, the actual outcome may be entered via a user interface of a user computer directly associated with analysis decision tree.
[00101] Then, at step 272, inputted probabilities for this node are compared to the actual outcome. The probabilities and actual outcome may be outputted at a user interface.
Individuals directly associated with the analysis decision tree can then inspect the output and adjust their predictions for future outcomes. Such individuals may also evaluate third-party probabilities to evaluate performance of the third-party sources.
[00102] At step 274, the system is updated based on the feedback from step 272. For instance, weightings for probability sources, whether individuals directly associated with the analysis decision tree or third parties, may be automatically adjusted based on differences between each probability and the actual outcome. For example, a lawyer who supplied a probability of 40% for an actual outcome that occurred may have his/her weighting reduced, whereas a lawyer who supplied a probability of 80% for the same outcome may have his/her weighting increase. Hence, predictive probabilities for future outcomes may have increased accuracy due to feedback from actual outcomes.
[00103] FIG. 13 shows a user interface 300, which can be provided at the user computers 12, 16, 18, probability-source computers 26, and any other computer operating in the system. The user interface 300 includes a display device 302 and an input device 304, such as a touchscreen element of the display device 302, a mouse, keyboard, or combination of such.
A network interface 306 is further provided to facilitate data communications between the user interface 300 and other computers.
[00104] The display device 302 is configured to provide a tree structure interface 310, node data entry fields 312, a constraint editing interface 314, and a probability interface 316. The tree structure interface 310 is configured to allow a user to build, edit, and manipulate a decision tree using the input device 304 and to output resulting values 320 for the tree in the context of the tree. The tree structure interface 310 also allows toggling between the analysis decision tree 100 and the supporting constraint decision tree 102. The node data entry fields 312 allow selection of node types and entry of outcome properties such as values, outcome characteristics, and similar data for a selected node. The constraint editing interface 314 allows for entry of logical tests of outcome characteristics 98, user-defined relationships 99, and other input data to establish dependency relationships for the constraint decision tree 102. The probability interface 316 allows for the selection of probabilities for various nodes, the selection of weightings for combined or blended probabilities, the triggering messages to third-parties to provide probabilities, the entry of actual outcomes, and the viewing of such elements to provide feedback for an analysis.
[00105] Numerous advantages of the present invention should be apparent from the above detailed description. Decision trees can be constructed efficiently by eliminating impossible or highly unlikely nodes and branches. Computer resources can be saved and processing efficiency , can be increased while enabling the modeling of highly complex situations, such as the principled settlement of tax disputes. Further, the third-party remote input of probabilities as well as the blending and feedback for probabilities allows for increased accuracy.
[00106] While the foregoing provides certain non-limiting example embodiments, it should be understood that combinations, subsets, and variations of the foregoing are contemplated.
The monopoly sought is defined by the claims.

Claims (18)

What is claimed is:
1. A process for computerized generation of a decision tree, the process comprising:
receiving input related to a plurality of values and a plurality of nodes for an analysis decision tree;
obtaining a constraint set, the constraint set including one or more constraints, each constraint of the constraint set limiting possible outcome values for a node of the analysis decision tree;
generating a structure for the analysis decision tree including the plurality of nodes;
applying the constraint set to the plurality of values to obtain at least one constrained value;
mapping the at least one constrained value to at least one outcome value of the plurality of nodes of the analysis decision tree; and outputting the analysis decision tree at a user interface of the computer.
2. The process of claim 1, wherein applying the constraint set comprises generating a constraint decision tree including associating at least two nodes of the constraint decision tree according to a dependency relationship, negating from the constraint decision tree at least one branch that corresponds to the at least two nodes associated by the dependency relationship, computing a result node value for the constraint decision tree, and taking the result node value as the at least one constrained value.
3. The process of claim 1, further comprising receiving input of at least one probability for at least one node of the analysis decision tree from a remote probability source via a computer network.
4. The process of claim 3, further comprising generating an anonymized version of the analysis decision tree and transmitting the anonymized version of the analysis decision tree to the remote probability source before receiving the input of the at least one probability.
5. The process of claim 3, further comprising transmitting a portion of the analysis decision tree containing the at least one node to the remote probability source before receiving the input of the at least one probability.
6. The process of claim 3, wherein the analysis decision tree models a legal process.
7. The process of claim 6, wherein the constraint decision tree models principled settlement of legal issues and wherein the dependency relationship of the constraint decision tree represents a constraint of principled settlement.
8. The process of claim 7, wherein the constraint decision tree models principled settlement of tax law.
9. A system for computerized generation of a decision tree, the system comprising:
a user interface configured to receive input related to a plurality of values and a plurality of nodes for an analysis decision tree, the user interface further configured to output the analysis decision tree;
a constraint module configured to operate on a constraint set, the constraint set including one or more constraints, each constraint of the constraint set limiting possible outcome values for a node of the analysis decision tree; and a decision tree generator configured to generate a structure for the analysis decision tree including the plurality of nodes, the decision tree generator further configured to apply the constraint set to the plurality of values to obtain at least one constrained value, map the at least one constrained value to at least one outcome value of the plurality of nodes of the analysis decision tree, and output the analysis decision tree at the user interface.
10. The system of claim 9, wherein the decision tree generator and the constraint module are configured apply the constraint set by generating a constraint decision tree including associating at least two nodes of the constraint decision tree according to a dependency relationship, and by negating from the constraint decision tree at least one branch that corresponds to the at least two nodes associated by the dependency relationship, and further by computing a result node value for the constraint decision tree and taking the result node value as the at least one constrained value.
11. The system of claim 9, further comprising a network module configured to receive input of at least one probability for at least one node of the analysis decision tree from a remote probability source via a computer network.
12. The system of claim 11, wherein the decision tree generator is further configured to generate an anonymized version of the analysis decision tree, and wherein the network module is further configured to transmit the anonymized version of the analysis decision tree to the remote probability source before receiving the input of the at least one probability.
13. The system of claim 11, wherein the network module is further configured to transmit a portion of the analysis decision tree containing the at least one node to the remote probability source before receiving the input of the at least one probability.
14. The system of claim 11, further comprising a client computer that implements the user interface and a server computer that implements the constraint module, the decision tree generator, and the network module, the client computer, server computer, and remote probability source being connected for data communications over the computer network.
15. The system of claim 9, wherein the analysis decision tree models a legal process.
16. The system of claim 15, wherein the constraint decision tree models principled settlement of legal issues and wherein the dependency relationship of the constraint decision tree represents a constraint of principled settlement.
17. The system of claim 16, wherein the constraint decision tree models principled settlement of tax law.
18. The system of claim 9, further comprising a client computer that implements the user interface and a server computer that implements the constraint module and the decision tree generator, the client computer and server computer connected for data communications over a computer network.
CA2942325A 2016-09-19 2016-09-19 Computer generation of decision trees Abandoned CA2942325A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA2942325A CA2942325A1 (en) 2016-09-19 2016-09-19 Computer generation of decision trees

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CA2942325A CA2942325A1 (en) 2016-09-19 2016-09-19 Computer generation of decision trees

Publications (1)

Publication Number Publication Date
CA2942325A1 true CA2942325A1 (en) 2018-03-19

Family

ID=61685156

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2942325A Abandoned CA2942325A1 (en) 2016-09-19 2016-09-19 Computer generation of decision trees

Country Status (1)

Country Link
CA (1) CA2942325A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446394A (en) * 2019-08-28 2021-03-05 北京中关村科金技术有限公司 Graph-based decision method, device and storage medium
US11030667B1 (en) * 2016-10-31 2021-06-08 EMC IP Holding Company LLC Method, medium, and system for recommending compositions of product features using regression trees

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11030667B1 (en) * 2016-10-31 2021-06-08 EMC IP Holding Company LLC Method, medium, and system for recommending compositions of product features using regression trees
CN112446394A (en) * 2019-08-28 2021-03-05 北京中关村科金技术有限公司 Graph-based decision method, device and storage medium

Similar Documents

Publication Publication Date Title
Goepel Implementation of an online software tool for the analytic hierarchy process (AHP-OS)
Roy et al. The European school of MCDA: Emergence, basic features and current works
Bapst A stochastic rate‐calibrated method for time‐scaling phylogenies of fossil taxa
US20090327000A1 (en) Managing Change Requests in an Enterprise
Holt et al. Eliciting and combining decision criteria using a limited palette of utility functions and uncertainty distributions: illustrated by application to pest risk analysis
Monzer et al. Aggregation-based framework for construction risk assessment with heterogeneous groups of experts
Jebeile et al. Multi-model ensembles in climate science: Mathematical structures and expert judgements
Perkusich et al. A model to detect problems on scrum-based software development projects
Zhang et al. A spatial fuzzy influence diagram for modelling spatial objects’ dependencies: A case study on tree-related electric outages
Hu et al. CPA firm’s cloud auditing provider for performance evaluation and improvement: an empirical case of China
Awad et al. Adaptive learning of contractor default prediction model for surety bonding
Bayram et al. Cost forecasting using RCF: a case study for planning public building projects costs in Turkey
CA2942325A1 (en) Computer generation of decision trees
US20220215243A1 (en) Risk-Reliability Framework for Evaluating Synthetic Data Models
US20220215242A1 (en) Generation of Secure Synthetic Data Based On True-Source Datasets
McGregor et al. Potential threats for the auditing profession, audit firms and audit processes inherent in using emerging technology
Jakeman et al. Uncertainty in environmental water quality modelling: where do we stand?
US11847390B2 (en) Generation of synthetic data using agent-based simulations
Prochaska et al. The utility of a system dynamics approach for understanding cumulative health risk from exposure to environmental hazards
Aussenac et al. The Salem simulator version 2.0: a tool for predicting the productivity of pure and mixed forest stands and simulating management operations
US20220215262A1 (en) Augmenting Datasets with Synthetic Data
EP4275343A1 (en) Generation and evaluation of secure synthetic data
Shavazipour et al. Interactive decision support and trade-off analysis for sustainable forest landscape planning under deep uncertainty
CN113344369A (en) Method and device for attributing image data, electronic equipment and storage medium
Moussa et al. Decision tree modeling using integrated multilevel stochastic networks

Legal Events

Date Code Title Description
FZDE Discontinued

Effective date: 20220322

FZDE Discontinued

Effective date: 20220322