WO2021113973A1 - Constrained contextual bandit reinforcement learning - Google Patents

Constrained contextual bandit reinforcement learning Download PDF

Info

Publication number
WO2021113973A1
WO2021113973A1 PCT/CA2020/051698 CA2020051698W WO2021113973A1 WO 2021113973 A1 WO2021113973 A1 WO 2021113973A1 CA 2020051698 W CA2020051698 W CA 2020051698W WO 2021113973 A1 WO2021113973 A1 WO 2021113973A1
Authority
WO
WIPO (PCT)
Prior art keywords
treatment
model
selecting
generating
selector
Prior art date
Application number
PCT/CA2020/051698
Other languages
French (fr)
Inventor
Val Andrei FAJARDO
Jiaxi LIANG
Charulata JAISWAL
Original Assignee
Integrate.Ai Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Integrate.Ai Inc. filed Critical Integrate.Ai Inc.
Publication of WO2021113973A1 publication Critical patent/WO2021113973A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Definitions

  • the disclosure relates to a method using constrained contextual bandit reinforcement learning.
  • a request is received.
  • a user feature set is generated using the request.
  • a treatment identifier is selected by a server application using a segment cluster, of a plurality of segment clusters from a machine learning model, corresponding to the user feature set and an allocation vector, of a plurality of allocation vectors generated from a reward matrix and a plurality of treatment constraints using an optimization engine, corresponding to the segment cluster.
  • the treatment identifier is presented.
  • the disclosure relates to a system including a server and a server application.
  • the server includes one or more processors and one or more memories.
  • the server application executes on one or more processors of the server.
  • a request is received.
  • a user feature set is generated using the request.
  • a treatment identifier is selected by the server application using a segment cluster, of a plurality of segment clusters from a machine learning model, corresponding to the user feature set and an allocation vector, of a plurality of allocation vectors generated from a reward matrix and a plurality of treatment constraints using an optimization engine, corresponding to the segment cluster.
  • the treatment identifier is presented.
  • the disclosure relates to a method that trains systems for constrained contextual bandit reinforcement learning.
  • a machine learning model is trained to generate a plurality of segment clusters and a plurality of allocation vectors.
  • a user feature set is generated using a request.
  • a treatment identifier is selected by a server application using a segment cluster, of a plurality of segment clusters from a machine learning model, corresponding to the user feature set and an allocation vector, of a plurality of allocation vectors generated from a reward matrix and a plurality of treatment constraints using an optimization engine, corresponding to the segment cluster.
  • the treatment identifier is presented.
  • Figure 1A, Figure IB, Figure 1C, and Figure ID show diagrams of systems in accordance with disclosed embodiments.
  • FIG. 1 Figure 2A, Figure 2B, Figure 2C, and Figure 2D show flowcharts in accordance with disclosed embodiments.
  • Figure 3 A and Figure 3B show examples in accordance with disclosed embodiments.
  • Figure 4 A and Figure 4B show computing systems in accordance with disclosed embodiments. DETAILED DESCRIPTION
  • ordinal numbers e.g ., first, second, third, etc.
  • an element i.e., any noun in the application.
  • the use of ordinal numbers is not to imply or create any particular ordering of the elements nor to limit any element to being only a single element unless expressly disclosed, such as by the use of the terms “before”, “after”, “single”, and other such terminology. Rather, the use of ordinal numbers is to distinguish between the elements.
  • a first element is distinct from a second element, and the first element may encompass more than one element and succeed (or precede) the second element in an ordering of elements.
  • embodiments of the invention utilize a constrained contextual bandit trained with reinforcement learning to identify actions taken by the website.
  • the actions include identifying a treatment (with a treatment identifier) that is included with a purchase offer for a product displayed on a website.
  • the action space of the constrained contextual bandit includes the treatments that may be used by the system.
  • the state space of the contextual bandit is identified from the usage of the website by the users (also referred to as customers) of the website in the form of user feature sets (also referred to as customer feature sets). Multiple models may be periodically trained and tested to select the model with the highest metric conversion projection to use with live system requests.
  • a user feature set is a vector that identifies the interaction of a user with the website.
  • elements of the user feature set may include values that indicate the duration of the present session, the total amount of historical purchases, the time since last page request, the time spent hovering over the purchase button, the type of device used to browse the website, the date and time (minutes, hours, seconds, day of month, day of week, month, year, etc) that an input is received, etc.
  • a treatment identifier is a value that uniquely identifies one of a plurality of treatments.
  • a treatment is an update to a web page or web application.
  • a treatment may provide a discount to a product purchased through a website.
  • the treatment may be stored as a collection of data and code that identifies an adjustment to another value (e.g ., the amount of the discount to be applied to a price value for a product) and may include content (text, images, video, etc.) to display in a web page when the treatment is to be used.
  • Figures 1A, IB, 1C, and ID show diagrams of embodiments that are in accordance with the disclosure.
  • Figure 1 A shows the model training application (103), which trains machine learning models to select treatment identifiers.
  • Figure IB shows the server application (102), which uses the trained machine learning models to select treatment identifiers.
  • Figure 1C shows the model update application (104), which updates the model used to select treatment identifiers.
  • Figure ID shows a system (100), which performs scalable request authorization.
  • the embodiments of Figures 1A, IB, 1C, and ID may be combined and may include or be included within the features and embodiments described in the other figures of the application.
  • Figures 1A, IB, 1C, and ID are, individually and as a combination, improvements to the technology of machine learning and request authorization.
  • the various elements, systems, and components shown in Figures 1 A, IB, 1C, and ID may be omitted, repeated, combined, and/or altered as shown from Figures 1A, IB, 1C, and ID. Accordingly, the scope of the present disclosure should not be considered limited to the specific arrangements shown in Figures 1A, IB, 1C, and ID.
  • the model training application (103) trains the models used by the system (100) (shown in Figure ID).
  • the model training application (103) includes the propensity model (128), the feature vector generator (133), the segmentation model (138), the matrix generator (143), and the optimization engine (147).
  • the propensity model (128) generates the action probabilities (131) from the treatment identifiers (124) and the training user feature sets (127).
  • the action probability (130) is generated from the treatment identifier (123) and the training user feature set (126).
  • the action probability (130) identifies the probability that a user, identified with the training user feature set (126), will perform a specific action.
  • an action may be a “conversion” in which the system receives a user selection of a purchase button (to purchase a product) from a page in which the treatment of the treatment identifier (123) is displayed.
  • the training user feature sets (127), which include the training user feature set (126), are generated from the training data (112) (shown in figure ID), which includes historical user interaction data of user interacting with the website hosted by the system (100) (shown in Figure ID) using the server application (102) (shown in Figure ID).
  • the treatment identifiers (124) uniquely identify one of a number of treatments that may be used by the system.
  • the treatment identifiers (124) include the treatment identifier (123).
  • the feature vectors (136), which include the feature vector (135), are generated by the feature vector generator (133) using the training user feature sets (127) and the action probabilities (131).
  • the feature vector (135) includes a first set of elements from the training user feature set (126) (for the different features related to the user interaction with the website) and a second set of elements for the treatments, one of which includes the action probability (130).
  • the segment clusters (141), which include the segment cluster (140), are generated from the feature vectors (136) using the segmentation model (138).
  • the segmentation model (138) uses a clustering algorithm to identify the segment clusters (141) from the feature vectors (136).
  • Clustering algorithms that may be used by the segmentation model (138) include K means clustering, Gaussian mixture model, etc.
  • the number of segments is a hyperparameter that may be defined by the model generator (160) (shown in Figure 1C).
  • the reward matrix (145) is generated by the matrix generator (143) from the feature vectors (136) and the segment clusters (141). In one embodiment, columns of the reward matrix (145) identify the treatments that may be used and rows of the reward matrix (145) may identify the user segments (also referred to as customer segments) that correspond to the segment clusters (141). The values in the reward matrix (145) identify the action probabilities for users in a user segment for a treatment.
  • the optimization engine (147) generates the allocation vectors (153) from the reward matrix (145) and the treatment constraints (150).
  • the optimization engine (147) uses a linear programming algorithm to apply the treatment constraints (150) to the reward matrix (145) to generate the allocation vectors (153).
  • the treatment constraints (150), which include the treatment constraint (149), are constraints on usage of the treatments by the system.
  • a treatment constraint may identify that a particular treatment may be used for fixed percentage of the number of purchases made with the system (100) (shown in Figure ID).
  • Each treatment may have multiple constraints.
  • the allocation vectors (153) include the allocation vector (152).
  • the allocation vector (152) includes an element for each of the treatments offered by the system.
  • An allocation vector represents an optimal percentage allocation of treatments for a segment (which is linked to a segment cluster). An optimal percentage allocation achieves metric projections that are higher than metric projections for a nonoptimal percentage allocation.
  • a first segment (identified with a first segment cluster) has an optimal allocation vector of “0.25, 0.25, 0.50” (for first, second, and third treatments identified with first, second, and third treatment identifiers respectively)
  • this result from the optimization engine (147) i.e ., the optimal allocation vector
  • the optimal allocation vector indicates that in order to optimize a metric (e.g. , conversion), the first treatment and the second treatment are each allocated to 25% of the population for the first segment and, to the remaining 50% of the population, the third treatment is allocated.
  • the weighted selector (185) uses a weighted policy to select the treatment identifier (188) from the plurality of treatment identifiers corresponding to the treatments that may be used by the system (100) (shown in Figure ID).
  • the weighted selector (185) uses the current model (154) to select the treatment identifier (188).
  • the current model (154) includes the segment clusters (141) and the allocation vectors (153).
  • the weighted selector (185) identifies the segment cluster (140) as corresponding to the user feature set (180) and identifies the allocation vector (152) as corresponding to the segment cluster (140).
  • the weighted selector (185) uses the elements in the allocation vector (152) as the weights for the random selection of the treatment identifier (188) from the plurality of treatment identifiers. For example, when a first element of the allocation vector (152) is 0.5, then the treatment identifier corresponding to that first element has a 50% chance of being selected. A random number is selected and the element of the allocation vector (152) corresponding to the random number is identified to select the treatment identifier (188).
  • the treatment identifier that corresponds to the second element is selected as the treatment identifier (188).
  • the model update application (104) updates the models used by the system (100) (shown in Figure ID).
  • the model update application (104) includes the model generator (160), the model tester (167), and the model selector (172).
  • the current model (165) is the model presently being used by the system (100) (shown in Figure ID) to handle live requests received by the server application (102) (shown in Figure IB).
  • the current model (165) is included with the test models (163) to be tested with the model tester (167).
  • the model tester (167) generates the metric projections (170) from the test models (163) and the current model (165). For each of the models being tested, the model tester (167) plays back the interaction data (109) (shown in Figure ID) to generate a projection of the number of conversions using a particular model.
  • the metric projections (170) include the metric projection (169).
  • the metric projection (169) may be a projection of the number of actions generated during a playback of the historical data of interaction with the website by the users of the website using either one of the test models (163) or the current model (165).
  • a metric may be the conversion rate that identifies how often a web session is converted into a sale and the metric projection may be a projection of the rate or amount of conversions using the treatments available to the system.
  • the model selector (172) selects the subsequent model (174) from the group of the test models (163) and the current model (165). In one embodiment, the model selector (172) selects the subsequent model (174), which corresponds to the metric projection (169), which has the highest value ( e.g ., the most conversions) from the values of the metric projections (170).
  • the server application (102) is a program on the server (101).
  • the server application (102) includes multiple programs used by the system (100) to interact with the user device (118), select treatment identifiers, and present treatments corresponding to the treatment identifiers to the user device (118).
  • the model training application (103) is a program on the server (101).
  • the model training application (103) trains the machine learning models as further described in Figure 1A.
  • the model training application (103) may be operated or controlled by the developer device (116) with a developer application (112).
  • the model update application (104) is a program on the server (101).
  • the model update application (104) updates the machine learning models used by the server application (102) to select treatment identifiers.
  • the model update application (104) may be operated or controlled by the developer device (116) with a developer application (117).
  • the server (101) is an embodiment of the computing system (400) and the nodes (422 and 424) of Figure 4 A and Figure 4B.
  • the server (101) may be one of a set of virtual machines hosted by a cloud services provider to deploy the model training application (103) and the server application (102).
  • Each of the programs running on the server (101) may execute inside one or more containers hosted by the server (101).
  • the repository (108) is a computing system that may include multiple computing devices in accordance with the computing system (400) and the nodes (422 and 424) described below in Figures 4 A and 4B.
  • the repository (108) may be hosted by a cloud services provider.
  • the cloud services provider may provide hosting, virtualization, and data storage services as well as other cloud services and to operate and control the data, programs, and applications that store and retrieve data from the repository (108).
  • the data in the repository (108) may include the interaction data (109), the treatment data (110), the machine learning model data (111), and the training data (112).
  • the interaction data (109) is the data recorded by the system (100) as a user interacts with the system (100) through the user application (119) and the server application (102).
  • the interaction data (109) includes data about a web session established between the user application (119) and the server application (102) and may include interaction data from the user application (119).
  • the user application (119) is a browser that sends interaction data identifying the mouse clicks, keyboard commands, and other browser events generated by the user application (119) in response to inputs from a user.
  • the interaction data (109) forms the basis for the user feature sets that are used to train new models and to process live requests with the server application (102) .
  • the treatment data (110) includes the code and data for the treatments used by the system (100).
  • the treatment data (110) also includes the treatment identifiers that uniquely identify the different treatments used by the system (100).
  • the treatments available to the system include specific treatments (a discount, free shipping, extra reward points, etc) and no treatment (i.e., no discount or other benefit is provided).
  • treatments may be added to or removed from the system using the developer device (116).
  • the machine learning model data (111) includes the code and data for the machine learning models used by the system (100).
  • the machine learning model data (111) includes the segment clusters (141) (shown in Figure 1 A) and allocation vectors (153) (shown in Figure 1A) used by the server application (102) to select treatment identifiers from user feature sets.
  • the training data (112) includes the data used to train the machine learning models.
  • the training data (112) may include the intermediate data generated to train and update the models (e.g ., the data shown in Figure IB and in Figure 1C).
  • the data in the repository (108) may also include a web page (113) that is part of a website hosted by the system (100) with which the users and the developers interact using the user device (118) and the developer device (116) to access the server application (102), the model training application (103), and the model update application (104).
  • the developer device (116) is an embodiment of the computing system (400) and the nodes (422 and 424) of Figure 4 A and Figure 4B.
  • the developer device (116) is a desktop personal computer (PC).
  • the developer device (116) includes the developer application (117) for accessing the model training application (103) and the model update application (104).
  • the developer application (117) may include a graphical user interface for interacting with the model training application (103) and the model update application (104) to control training and updating the machine learning models of the system (100).
  • the user device (118) is an embodiment of the computing system (400) and the nodes (422 and 424) of Figure 4 A and Figure 4B.
  • the user device (118) is a desktop personal computer (PC), a smartphone, a tablet, etc.
  • the user device (118) is used to access the web page (113) of the website hosted by the system (100).
  • the user device (118) includes the user application (119) for accessing the server application (102).
  • the user application (119) may include multiple interfaces (graphical user interfaces, application program interfaces (APIs)) for interacting with the server application (102).
  • a user may operate the user application (119) to perform tasks with the server application (102) to interact with the system (100).
  • the results may be presented by being displayed by the user device (118) in the user application (119).
  • the developer application (117) and the user application (119) may be web browsers that access the server application (102), the model training application (103), and the model update application (104) using web pages hosted by the server (101).
  • the developer application (117) and the user application (119) may additionally be web services that communicate with the server application (102), the model training application (103), and the model update application (104) using representational state transfer application programming interfaces (RESTful APIs).
  • Figure ID shows a client server architecture, one or more parts of the model training application (103) and the server application (102) may be local applications on the developer device (116) and the user device (118) without departing from the scope of the disclosure.
  • Figures 2A, 2B, 2C, and 2D show flowcharts of processes in accordance with the disclosure.
  • the process (200) of Figure 2 A uses constrained contextual bandit reinforcement learning to select treatments.
  • the process (220) of Figure 2B trains machine learning models using contextual bandit reinforcement learning.
  • the process (260) of Figure 2C selects treatment identifiers using constrained contextual bandit reinforcement learning.
  • the process (280) of Figure 2D updates machine learning models trained with constrained contextual bandit reinforcement learning.
  • the embodiments of Figures 2A and 2B may be combined and may include or be included within the features and embodiments described in the other figures of the application.
  • Figures 2A, 2B, 2C, and 2D are, individually and as an ordered combination, improvements to the technology of computing systems and machine learning systems. While the various steps in the flowcharts are presented and described sequentially, one of ordinary skill will appreciate that at least some of the steps may be executed in different orders, may be combined or omitted, and at least some of the steps may be executed in parallel. Furthermore, the steps may be performed actively or passively. For example, some steps may be performed using polling or be interrupt driven. By way of an example, determination steps may not have a processor process an instruction unless an interrupt is received to signify that condition exists. As another example, determinations may be performed by performing a test, such as checking a data value to test whether the value is consistent with the tested condition.
  • the process (200) may execute on a server to select treatments using a constrained contextual bandit and reinforcement learning.
  • a request is received the request may be received from a client device by a server and processed by the server.
  • the request may be for a product offered through a website hosted by the server for which the server will identify a treatment to include with a web page offering the product.
  • a user feature set is generated using the request.
  • the user feature set is a set of features extracted from interaction data, recorded by the server, that include and identify metrics and user inputs from the interaction between the client device and the server.
  • a treatment identifier is selected by a server application using a segment cluster corresponding to the user feature set and an allocation vector corresponding to the segment cluster.
  • the treatment identifier may be selected with a random selector that uses a random policy and with a weighted selector that uses a weighted policy.
  • the machine learning model Prior to selection of the treatment identifier, the machine learning model is trained to generate the plurality of segment clusters and the plurality of allocation vectors from training data.
  • the segment cluster is one of a plurality of segment clusters from a machine learning model used by the system.
  • the segment clusters are generated with a model training application.
  • the allocation vector is one of a plurality of allocation vectors generated from a reward matrix and a plurality of treatment constraints using an optimization engine.
  • the treatment constraints are applied to the reward matrix using a linear programming algorithm.
  • the treatment identifier is presented.
  • presentation of the treatment identifier may include generating a web page that includes content of the treatment specified by the treatment identifier, transmitting the webpage from this server to the client device, and rendering the webpage on the client device.
  • the process (220) may execute on a server to train machine learning models using constrained contextual bandit reinforcement learning.
  • a plurality of action probabilities are generated from a plurality of treatment identifiers and a plurality of user feature sets using a propensity model.
  • the action probabilities are generated by averaging the number of conversions of a treatment for a given set of user features (also referred to as a user feature set).
  • a plurality of feature vectors are generated from the plurality of user feature sets and the plurality of action probabilities using a feature vector generator.
  • the feature vector generator may generate a user feature set with a vector of treatment action probabilities that correspond to the user feature set to form a feature vector.
  • Step 226 training a segmentation model to generate the plurality of segment clusters from the plurality of vectors.
  • the segmentation model may use a clustering algorithm (K means clustering, Gaussian mixture model, etc) to generate these segment clusters from the feature vectors generated from the user feature sets and action probabilities by the feature vector generator.
  • K means clustering, Gaussian mixture model, etc
  • a reward matrix is generated from the plurality of feature vectors and the plurality of segment clusters using a matrix generator.
  • the rows of the reward matrix correspond to the second clusters with a one to one correspondence and the columns of the reward matrix correspond to the treatments available to the system in a one to one correspondence.
  • the treatments available to the system are enumerated in the columns and include specific treatments (a discount, free shipping, extra reward points, etc) and no treatment (. i.e ., no discount or other benefit is provided).
  • the plurality of allocation vectors corresponding to the plurality of segment clusters is generated from the reward matrix and a plurality of treatment constraints using an optimization engine.
  • the allocation vectors may correspond to the segment clusters with a one to one correspondence.
  • the optimization engine may treat the values in a reward matrix as coefficients of a linear programming problem and iteratively solve for a set of coefficients ( i.e ., values of the reward matrix) that maximize an output variable (e.g ., an expected action, a conversion, a conversion rate, etc.).
  • the optimization engine may use basis exchange algorithms, interior point algorithms, etc. to perform the optimization.
  • the process (260) may execute on a server to select a treatment identifier using constrained contextual bandit reinforcement learning.
  • Step 262 in response to the system receiving a request, one of a random selector and a weighted selector is selected.
  • the random selector and the weighted selector are used to select the treatment identifier using a treatment selector.
  • An epsilon greedy algorithm may be used to determine which of the random selector or the weighted selector will be used to select the treatment identifier.
  • a random number e.g., a real number ranging from 0 to 1
  • a value for epsilon e.g, 0.1
  • the random selector may be used and otherwise the weighted selector may be used.
  • the value of epsilon may be lowered from an initial value (e.g, 0.5).
  • the treatment identifier is selected using the segment cluster and the allocation vector.
  • the weighted selector uses a weighted policy defined by the allocation vector.
  • the allocation vector identifies a plurality of weights used to randomly select the treatment identifier. For example, when a first element of an allocation vector is 0.3 (indicating a 30% chance of being selected), a second element is 0.6, a third element is 0.1, and the random number is between 0.9 (i.e., 0.3 + 0.6) and 1 (i.e., 0.3 + 0.6 + 0.1), then the treatment identifier that corresponds to the third element is selected as the treatment identifier.
  • the treatment identifier is selected randomly from a plurality of treatment identifiers. For example, when three treatments are available, a random integer between 1 and 3 may be generated with a random number generator. The treatment identifier associated with the value of the random number may be selected as the treatment identifier used by the system.
  • the process (280) may execute on a server to update machine learning models trained with constrained contextual bandit reinforcement learning.
  • a plurality of test models are generated.
  • the different models may be generated using different algorithms and different hyperparameters.
  • some models may be generated using a K means clustering algorithm and other models may be generated using a Gaussian mixture model algorithm.
  • Different models may be generated with different numbers of segments. For example one model may have five user segments and another model may have seven user segments.
  • a plurality of metric projections are generated using a model tester, the plurality of test models, and a current model.
  • the plurality of metric projections are mapped to a group comprising the current model and the plurality of test models in a one to one correspondence.
  • the model tester plays back historical interaction data through the models being tested to generate a metric projection for each model being tested.
  • a subsequent model is selected from the group comprising the current model and the plurality of test models using the plurality of metric projections and a model selector.
  • the model corresponding to the metric projection in accordance with a criterion is selected.
  • the criterion may be to identify the metric projection with the highest number of conversions.
  • the subsequent model replaces the current model as the model being used by the system to process requests.
  • the subsequent model is selected as part of a periodic process.
  • the system may engage the process (280) daily, weekly, monthly, etc., to continuously refine and improve the current model being used by the system to handle live requests.
  • the plurality of allocation vectors of the current model may also be updated periodically and prior to selecting a subsequent model to replace the machine learning model.
  • subsequent models may be selected on a weekly basis and the current model may be updated on a daily basis.
  • the update may include recalculating the reward matrix and allocation vectors using the new interaction data collected since the last update to the current model.
  • Figures 3 A and 3B show examples of interfaces and web pages in accordance with the disclosure.
  • Figure 3A shows a user interface to manage treatments selected using constrained contextual bandit reinforcement learning.
  • Figures 3B shows interaction with users of the system.
  • the embodiments of Figures 3 A and 3B may be combined and may include or be included within the features and embodiments described in the other figures of the application.
  • the features and elements of Figures 3 A and 3B are, individually and as a combination, improvements to the technology of computing systems and machine learning systems.
  • the various features, elements, widgets, components, and interfaces shown in Figures 3 A and 3B may be omitted, repeated, combined, and/or altered as shown. Accordingly, the scope of the present disclosure should not be considered limited to the specific arrangements shown in Figures 3 A and 3B.
  • the user interface (300) includes several elements (also referred to as user interface elements)that may be displayed on a developer device in a developer application to control the treatments and models used by the system.
  • the Elements (302) include switches for identifying which treatments are available to be used with any category of products.
  • the different treatments may be coded with colors that correspond to the colors used in the elements (310), (330), and (332).
  • the elements (304) include check boxes used to identify the usage of treatments with product categories.
  • a checkbox for each category is provided that determines whether or not the category is allowed to use treatments.
  • Each category may be expanded to show additional options and allow specific treatments to be selected for use with that category.
  • the column of elements (310) is a list of the treatments that are available to the currently selected category.
  • the currently selected category may be selected with the elements (332), which are further discussed below.
  • Four treatments are available “10% discount”, “free shipping”, “bonus reward points”, and “not treated”.
  • the column of elements (312) lists the treatment rates for the treatments identified in the column of elements (310).
  • the “10% discount” treatment may be used for 10% of impressions, i.e ., for 10% of the page displays to users using the website.
  • the “free shipping” treatment may be used for 11% of impressions.
  • the “bonus reward points” treatment may be used for 5% of impressions.
  • the “not treated” treatment may be used for 74% of impressions.
  • the column of elements (314) lists the expected conversion rate for the treatments listed in the column of elements (310). As an example, the “10% discount” treatment is projected to have a conversion rate of 12%.
  • the element (322) identifies the predicted increase in conversions using the treatment schedule defined with the user interface (300), i.e., when applying the treatments specified with a user interface (300) to the pages transmitted to the users of the system. For example, the element (322) indicates that the number of conversions is predicted to increase by 10% as compared to when no treatments are being provided by the system.
  • the element (324) identifies the projected lift.
  • the projected lift identifies an increase in revenue from using the treatment schedule defined with the user interface (300).
  • the element (330) is a chart that visually depicts the treatments to be used in response to page requests received by the system. Different colors in the chart correspond to the colors from the column of elements (310) and the elements (302).
  • the element (332) includes a plurality of icons. Selecting one of the icons from the element (332) updates the element (330) and the columns of elements (310), (312), and (314) to show corresponding information for the different categories of products.
  • the user (360) uses the browser A (362) to access the application (352) to purchase an airline ticket.
  • the browser A (362) sends a request to the application (352) for an offer for an airline ticket.
  • the application (352) has been configured by a developer to offer multiple treatments that include a 10% discount and a null treatment ( i.e ., no discount).
  • the model (356) has been previously trained to generate segmentation clusters and allocation vectors that are used to respond to requests from the browser A (362) and the browser B (372).
  • the allocation vectors of the model (356) include constraints for the available treatments with the discount treatment being offered on 30% of impressions and the null treatment being offered on 70% of impressions.
  • the application (352) In response to the request from the browser A (362), the application (352) generates a customer feature set from the interaction data from the repository (354) that corresponds to the web session with the user (360).
  • the application (352) uses the model (356) to select a treatment to provide with the offer to purchase an airline ticket through a web page.
  • the model (356) determines whether to use a random selector or a weighted selector based on the value of a random number compared to a threshold (epsilon). The random number is greater than the threshold and the weighted selector is used.
  • the weighted selector identifies the segment cluster closest to the customer feature set generated in response to receiving the request from the browser A (362).
  • the model (356) then identifies the allocation vector associated with the segment cluster.
  • the elements of the allocation vector are used as weights for randomly selecting one of the available treatments by the system and the first treatment (a 10% discount) is selected.
  • the application (352) generates the web page A (364) that includes the treatment (366) and transmits the web page A (364) to the browser A (362).
  • the browser A receives and displays the web page A (364). If the user (360) purchases the ticket (by clicking the select button), the airline ticket will be purchased with the 10% discount from the treatment.
  • a different user also accesses the application (352) and uses the browser B (372).
  • the user (370) may request the same type of ticket (the same departure and destination cities, the same times, the same airline, etc).
  • the application (352) In response to the request from the browser B (372), the application (352) generates the web page B (374). The model (356) is again used by the application (352) to select a treatment.
  • the treatment selected for the user (370) is the null treatment in which no treatment is provided (i.e ., no discount).
  • the web page B (374) does not include the treatment that was included in the webpage A (364). In this case, the random number compared to the threshold is less than the threshold, a pure random selection of the available treatments is made, and the null treatment is selected.
  • Embodiments of the invention may be implemented on a computing system specifically designed to achieve an improved technological result.
  • the features and elements of the disclosure provide a significant technological advancement over computing systems that do not implement the features and elements of the disclosure.
  • Any combination of mobile, desktop, server, router, switch, embedded device, or other types of hardware may be improved by including the features and elements described in the disclosure. For example, as shown in FIG.
  • the computing system (400) may include one or more computer processors (402), non-persistent storage (404) (e.g ., volatile memory, such as random access memory (RAM), cache memory), persistent storage (406) (e.g., a hard disk, an optical drive such as a compact disk (CD) drive or digital versatile disk (DVD) drive, a flash memory, etc), a communication interface (412) (e.g, Bluetooth interface, infrared interface, network interface, optical interface, etc), and numerous other elements and functionalities that implement the features and elements of the disclosure.
  • non-persistent storage 404
  • volatile memory such as random access memory (RAM), cache memory
  • persistent storage e.g., a hard disk, an optical drive such as a compact disk (CD) drive or digital versatile disk (DVD) drive, a flash memory, etc
  • a communication interface (412) e.g, Bluetooth interface, infrared interface, network interface, optical interface, etc
  • the computer processor(s) (402) may be an integrated circuit for processing instructions.
  • the computer processor(s) may be one or more cores or micro-cores of a processor.
  • the computing system (400) may also include one or more input devices (410), such as a touchscreen, keyboard, mouse, microphone, touchpad, electronic pen, or any other type of input device.
  • the communication interface (412) may include an integrated circuit for connecting the computing system (400) to a network (not shown) (e.g ., a local area network (LAN), a wide area network (WAN) such as the Internet, mobile network, or any other type of network) and/or to another device, such as another computing device.
  • a network not shown
  • LAN local area network
  • WAN wide area network
  • the computing system (400) may include one or more output devices (408), such as a screen (e.g., a liquid crystal display (LCD), a plasma display, touchscreen, cathode ray tube (CRT) monitor, projector, or other display device), a printer, external storage, or any other output device.
  • a screen e.g., a liquid crystal display (LCD), a plasma display, touchscreen, cathode ray tube (CRT) monitor, projector, or other display device
  • One or more of the output devices may be the same or different from the input device(s).
  • the input and output device(s) may be locally or remotely connected to the computer processor(s) (402), non-persistent storage (404) , and persistent storage (406).
  • the computer processor(s) (402)
  • non-persistent storage 404
  • persistent storage persistent storage
  • Software instructions in the form of computer readable program code to perform embodiments of the invention may be stored, in whole or in part, temporarily or permanently, on a non-transitory computer readable medium such as a CD, DVD, storage device, a diskette, a tape, flash memory, physical memory, or any other computer readable storage medium.
  • the software instructions may correspond to computer readable program code that, when executed by a processor(s), is configured to perform one or more embodiments of the invention.
  • the computing system (400) in FIG. 4A may be connected to or be a part of a network. For example, as shown in FIG.
  • the network (420) may include multiple nodes (e.g ., node X (422), node Y (424)).
  • Each node may correspond to a computing system, such as the computing system shown in FIG. 4A, or a group of nodes combined may correspond to the computing system shown in FIG. 4A.
  • embodiments of the invention may be implemented on a node of a distributed system that is connected to other nodes.
  • embodiments of the invention may be implemented on a distributed computing system having multiple nodes, where each portion of the invention may be located on a different node within the distributed computing system.
  • one or more elements of the aforementioned computing system (400) may be located at a remote location and connected to the other elements over a network.
  • the node may correspond to a blade in a server chassis that is connected to other nodes via a backplane.
  • the node may correspond to a server in a data center.
  • the node may correspond to a computer processor or micro-core of a computer processor with shared memory and/or resources.
  • the nodes may be configured to provide services for a client device (426).
  • the nodes may be part of a cloud computing system.
  • the nodes may include functionality to receive requests from the client device (426) and transmit responses to the client device (426).
  • the client device (426) may be a computing system, such as the computing system shown in FIG. 4A. Further, the client device (426) may include and/or perform all or a portion of one or more embodiments of the invention.
  • the computing system or group of computing systems described in FIG. 4A and 4B may include functionality to perform a variety of operations disclosed herein.
  • the computing system(s) may perform communication between processes on the same or different system.
  • a variety of mechanisms, employing some form of active or passive communication, may facilitate the exchange of data between processes on the same device. Examples representative of these inter-process communications include, but are not limited to, the implementation of a file, a signal, a socket, a message queue, a pipeline, a semaphore, shared memory, message passing, and a memory-mapped file. Further details pertaining to a couple of these non-limiting examples are provided below.
  • sockets may serve as interfaces or communication channel end-points enabling bidirectional data transfer between processes on the same device.
  • a server process e.g ., a process that provides data
  • the server process may create a first socket object.
  • the server process binds the first socket object, thereby associating the first socket object with a unique name and/or address.
  • the server process then waits and listens for incoming connection requests from one or more client processes (e.g., processes that seek data).
  • client processes e.g., processes that seek data.
  • the client process then proceeds to generate a connection request that includes at least the second socket object and the unique name and/or address associated with the first socket object.
  • the client process then transmits the connection request to the server process.
  • the server process may accept the connection request, establishing a communication channel with the client process, or the server process, busy in handling other operations, may queue the connection request in a buffer until server process is ready.
  • An established connection informs the client process that communications may commence.
  • the client process may generate a data request specifying the data that the client process wishes to obtain.
  • the data request is subsequently transmitted to the server process.
  • the server process analyzes the request and gathers the requested data.
  • the server process then generates a reply including at least the requested data and transmits the reply to the client process.
  • the data may be transferred, more commonly, as datagrams or a stream of characters ( e.g ., bytes).
  • Shared memory refers to the allocation of virtual memory space in order to substantiate a mechanism for which data may be communicated and/or accessed by multiple processes.
  • an initializing process first creates a shareable segment in persistent or non-persistent storage. Post creation, the initializing process then mounts the shareable segment, subsequently mapping the shareable segment into the address space associated with the initializing process. Following the mounting, the initializing process proceeds to identify and grant access permission to one or more authorized processes that may also write and read data to and from the shareable segment. Changes made to the data in the shareable segment by one process may immediately affect other processes, which are also linked to the shareable segment. Further, when one of the authorized processes accesses the shareable segment, the shareable segment maps to the address space of that authorized process. Often, only one authorized process may mount the shareable segment, other than the initializing process, at any given time.
  • the computing system performing one or more embodiments of the invention may include functionality to receive data from a user.
  • a user may submit data via a graphical user interface (GUI) on the user device.
  • GUI graphical user interface
  • Data may be submitted via the graphical user interface by a user selecting one or more graphical user interface widgets or inserting text and other data into graphical user interface widgets using a touchpad, a keyboard, a mouse, or any other input device.
  • information regarding the particular item may be obtained from persistent or non-persistent storage by the computer processor.
  • the contents of the obtained data regarding the particular item may be displayed on the user device in response to the user's selection.
  • a request to obtain data regarding the particular item may be sent to a server operatively connected to the user device through a network.
  • the user may select a uniform resource locator (URL) link within a web client of the user device, thereby initiating a Hypertext Transfer Protocol (HTTP) or other protocol request being sent to the network host associated with the URU.
  • HTTP Hypertext Transfer Protocol
  • the server may extract the data regarding the particular selected item and send the data to the device that initiated the request.
  • the contents of the received data regarding the particular item may be displayed on the user device in response to the user's selection.
  • the data received from the server after selecting the URU link may provide a web page in Hyper Text Markup Language (HTML) that may be rendered by the web client and displayed on the user device.
  • HTML Hyper Text Markup Language
  • the computing system may extract one or more data items from the obtained data.
  • the extraction may be performed as follows by the computing system in FIG. 4A.
  • the organizing pattern e.g ., grammar, schema, layout
  • the data is determined, which may be based on one or more of the following: position (e.g., bit or column position, Nth token in a data stream, etc), attribute (where the attribute is associated with one or more values), or a hierarchical/tree structure (consisting of layers of nodes at different levels of detail-such as in nested packet headers or nested document sections).
  • the raw, unprocessed stream of data symbols is parsed, in the context of the organizing pattern, into a stream (or layered structure) of tokens (where each token may have an associated token "type").
  • extraction criteria are used to extract one or more data items from the token stream or structure, where the extraction criteria are processed according to the organizing pattern to extract one or more tokens (or nodes from a layered structure).
  • position-based data the token(s) at the position(s) identified by the extraction criteria are extracted.
  • attribute/value-based data the token(s) and/or node(s) associated with the attribute(s) satisfying the extraction criteria are extracted.
  • the token(s) associated with the node(s) matching the extraction criteria are extracted.
  • the extraction criteria may be as simple as an identifier string or may be a query presented to a structured data repository (where the data repository may be organized according to a database schema or data format, such as XML).
  • the extracted data may be used for further processing by the computing system.
  • the computing system of FIG. 4A while performing one or more embodiments of the invention, may perform data comparison.
  • the comparison may be performed by submitting A, B, and an opcode specifying an operation related to the comparison into an arithmetic logic unit (ALU) (i.e ., circuitry that performs arithmetic and/or bitwise logical operations on the two data values).
  • ALU arithmetic logic unit
  • the ALU outputs the numerical result of the operation and/or one or more status flags related to the numerical result.
  • the status flags may indicate whether the numerical result is a positive number, a negative number, zero, etc.
  • the comparison may be executed. For example, in order to determine if A > B, B may be subtracted from A (i.e., A - B), and the status flags may be read to determine if the result is positive (i.e., if A > B, then A - B > 0).
  • a and B may be vectors, and comparing A with B requires comparing the first element of vector A with the first element of vector B, the second element of vector A with the second element of vector B, etc.
  • comparing A with B requires comparing the first element of vector A with the first element of vector B, the second element of vector A with the second element of vector B, etc.
  • if A and B are strings, the binary values of the strings may be compared.
  • the computing system in FIG. 4A may implement and/or be connected to a data repository.
  • a data repository is a database.
  • a database is a collection of information configured for ease of data retrieval, modification, re-organization, and deletion.
  • Database Management System is a software application that provides an interface for users to define, create, query, update, or administer databases.
  • the user, or software application may submit a statement or query into the DBMS. Then the DBMS interprets the statement.
  • the statement may be a select statement to request information, update statement, create statement, delete statement, etc.
  • the statement may include parameters that specify data, data containers (database, table, record, column, view, etc), identifiers, conditions (comparison operators), functions (e.g. join, full join, count, average, etc), sorts (e.g. ascending, descending), or others.
  • the DBMS may execute the statement. For example, the DBMS may access a memory buffer, a reference or index a file for read, write, deletion, or any combination thereof, for responding to the statement.
  • the DBMS may load the data from persistent or non-persistent storage and perform computations to respond to the query.
  • the DBMS may return the result(s) to the user or software application.
  • the computing system of FIG. 4A may include functionality to present raw and/or processed data, such as results of comparisons and other processing.
  • presenting data may be accomplished through various presenting methods.
  • data may be presented through a user interface provided by a computing device.
  • the user interface may include a GUI that displays information on a display device, such as a computer monitor or a touchscreen on a handheld computer device.
  • the GUI may include various GUI widgets that organize what data is shown as well as how data is presented to a user.
  • the GUI may present data directly to the user, e.g., data presented as actual data values through text, or rendered by the computing device into a visual representation of the data, such as through visualizing a data model.
  • a GUI may first obtain a notification from a software application requesting that a particular data object be presented within the GUI.
  • the GUI may determine a data object type associated with the particular data object, e.g, by obtaining data from a data attribute within the data object that identifies the data object type.
  • the GUI may determine any rules designated for displaying that data object type, e.g ., rules specified by a software framework for a data object class or according to any local parameters defined by the GUI for presenting that data object type.
  • the GUI may obtain data values from the particular data object and render a visual representation of the data values within a display device according to the designated rules for that data object type.
  • Data may also be presented through various audio methods.
  • data may be rendered into an audio format and presented as sound through one or more speakers operably connected to a computing device.
  • Data may also be presented to a user through haptic methods.
  • haptic methods may include vibrations or other physical signals generated by the computing system.
  • data may be presented to a user using a vibration generated by a handheld computer device with a predefined duration and intensity of the vibration to communicate the data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Computational Linguistics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Mathematical Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Game Theory and Decision Science (AREA)
  • Mathematical Optimization (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A method uses constrained contextual bandit reinforcement learning. A request is received. A user feature set is generated using the request. A treatment identifier is selected by a server application using a segment cluster, of a plurality of segment clusters from a machine learning model, corresponding to the user feature set and an allocation vector, of a plurality of allocation vectors generated from a reward matrix and a plurality of treatment constraints using an optimization engine, corresponding to the segment cluster. The treatment identifier is presented.

Description

CONSTRAINED CONTEXTUAL BANDIT REINFORCEMENT
LEARNING
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Application No. 62/945,870, filed December 9, 2019 the entirety of which is hereby incorporated by reference.
BACKGROUND
[0002] Customers use websites to make purchases. A challenge is for computing systems to identify treatments that improve metrics for the websites, in which the metrics may include the rate of conversion ( i.e ., the number of purchases made by customers), net revenue, profit, page views, etc.
SUMMARY
[0003] In general, in one or more aspects, the disclosure relates to a method using constrained contextual bandit reinforcement learning. A request is received. A user feature set is generated using the request. A treatment identifier is selected by a server application using a segment cluster, of a plurality of segment clusters from a machine learning model, corresponding to the user feature set and an allocation vector, of a plurality of allocation vectors generated from a reward matrix and a plurality of treatment constraints using an optimization engine, corresponding to the segment cluster. The treatment identifier is presented.
[0004] In general, in one or more aspects, the disclosure relates to a system including a server and a server application. The server includes one or more processors and one or more memories. The server application executes on one or more processors of the server. A request is received. A user feature set is generated using the request. A treatment identifier is selected by the server application using a segment cluster, of a plurality of segment clusters from a machine learning model, corresponding to the user feature set and an allocation vector, of a plurality of allocation vectors generated from a reward matrix and a plurality of treatment constraints using an optimization engine, corresponding to the segment cluster. The treatment identifier is presented.
[0005] In general, in one or more aspects, the disclosure relates to a method that trains systems for constrained contextual bandit reinforcement learning. A machine learning model is trained to generate a plurality of segment clusters and a plurality of allocation vectors. A user feature set is generated using a request. A treatment identifier is selected by a server application using a segment cluster, of a plurality of segment clusters from a machine learning model, corresponding to the user feature set and an allocation vector, of a plurality of allocation vectors generated from a reward matrix and a plurality of treatment constraints using an optimization engine, corresponding to the segment cluster. The treatment identifier is presented.
[0006] Other aspects of the invention will be apparent from the following description and the appended claims.
BRIEF DESCRIPTION OF DRAWINGS
[0007] Figure 1A, Figure IB, Figure 1C, and Figure ID show diagrams of systems in accordance with disclosed embodiments.
[0008] Figure 2A, Figure 2B, Figure 2C, and Figure 2D show flowcharts in accordance with disclosed embodiments.
[0009] Figure 3 A and Figure 3B show examples in accordance with disclosed embodiments.
[0010] Figure 4 A and Figure 4B show computing systems in accordance with disclosed embodiments. DETAILED DESCRIPTION
[0011] Specific embodiments of the invention will now be described in detail with reference to the accompanying figures. Like elements in the various figures are denoted by like reference numerals for consistency.
[0012] In the following detailed description of embodiments of the invention, numerous specific details are set forth in order to provide a more thorough understanding of the invention. However, it will be apparent to one of ordinary skill in the art that the invention may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid unnecessarily complicating the description.
[0013] Throughout the application, ordinal numbers ( e.g ., first, second, third, etc.) may be used as an adjective for an element (i.e., any noun in the application). The use of ordinal numbers is not to imply or create any particular ordering of the elements nor to limit any element to being only a single element unless expressly disclosed, such as by the use of the terms “before”, “after”, “single”, and other such terminology. Rather, the use of ordinal numbers is to distinguish between the elements. By way of an example, a first element is distinct from a second element, and the first element may encompass more than one element and succeed (or precede) the second element in an ordering of elements.
[0014] In general, embodiments of the invention utilize a constrained contextual bandit trained with reinforcement learning to identify actions taken by the website. In one embodiment, the actions include identifying a treatment (with a treatment identifier) that is included with a purchase offer for a product displayed on a website. The action space of the constrained contextual bandit includes the treatments that may be used by the system. The state space of the contextual bandit is identified from the usage of the website by the users (also referred to as customers) of the website in the form of user feature sets (also referred to as customer feature sets). Multiple models may be periodically trained and tested to select the model with the highest metric conversion projection to use with live system requests.
[0015] A user feature set is a vector that identifies the interaction of a user with the website. As an example, elements of the user feature set may include values that indicate the duration of the present session, the total amount of historical purchases, the time since last page request, the time spent hovering over the purchase button, the type of device used to browse the website, the date and time (minutes, hours, seconds, day of month, day of week, month, year, etc) that an input is received, etc.
[0016] A treatment identifier is a value that uniquely identifies one of a plurality of treatments. In one embodiment, a treatment is an update to a web page or web application. For example, a treatment may provide a discount to a product purchased through a website. The treatment may be stored as a collection of data and code that identifies an adjustment to another value ( e.g ., the amount of the discount to be applied to a price value for a product) and may include content (text, images, video, etc.) to display in a web page when the treatment is to be used.
[0017] Figures 1A, IB, 1C, and ID show diagrams of embodiments that are in accordance with the disclosure. Figure 1 A shows the model training application (103), which trains machine learning models to select treatment identifiers. Figure IB shows the server application (102), which uses the trained machine learning models to select treatment identifiers. Figure 1C shows the model update application (104), which updates the model used to select treatment identifiers. Figure ID shows a system (100), which performs scalable request authorization. The embodiments of Figures 1A, IB, 1C, and ID may be combined and may include or be included within the features and embodiments described in the other figures of the application. The features and elements of Figures 1A, IB, 1C, and ID are, individually and as a combination, improvements to the technology of machine learning and request authorization. The various elements, systems, and components shown in Figures 1 A, IB, 1C, and ID may be omitted, repeated, combined, and/or altered as shown from Figures 1A, IB, 1C, and ID. Accordingly, the scope of the present disclosure should not be considered limited to the specific arrangements shown in Figures 1A, IB, 1C, and ID.
[0018] Turning to Figure 1A, the model training application (103) trains the models used by the system (100) (shown in Figure ID). The model training application (103) includes the propensity model (128), the feature vector generator (133), the segmentation model (138), the matrix generator (143), and the optimization engine (147).
[0019] The propensity model (128) generates the action probabilities (131) from the treatment identifiers (124) and the training user feature sets (127). The action probability (130) is generated from the treatment identifier (123) and the training user feature set (126). The action probability (130) identifies the probability that a user, identified with the training user feature set (126), will perform a specific action. For example, an action may be a “conversion” in which the system receives a user selection of a purchase button (to purchase a product) from a page in which the treatment of the treatment identifier (123) is displayed.
[0020] The training user feature sets (127), which include the training user feature set (126), are generated from the training data (112) (shown in figure ID), which includes historical user interaction data of user interacting with the website hosted by the system (100) (shown in Figure ID) using the server application (102) (shown in Figure ID).
[0021] The treatment identifiers (124) uniquely identify one of a number of treatments that may be used by the system. The treatment identifiers (124) include the treatment identifier (123).
[0022] The feature vectors (136), which include the feature vector (135), are generated by the feature vector generator (133) using the training user feature sets (127) and the action probabilities (131). In one embodiment, the feature vector (135) includes a first set of elements from the training user feature set (126) (for the different features related to the user interaction with the website) and a second set of elements for the treatments, one of which includes the action probability (130).
[0023] The segment clusters (141), which include the segment cluster (140), are generated from the feature vectors (136) using the segmentation model (138). In one embodiment, the segmentation model (138) uses a clustering algorithm to identify the segment clusters (141) from the feature vectors (136). Clustering algorithms that may be used by the segmentation model (138) include K means clustering, Gaussian mixture model, etc. The number of segments is a hyperparameter that may be defined by the model generator (160) (shown in Figure 1C).
[0024] The reward matrix (145) is generated by the matrix generator (143) from the feature vectors (136) and the segment clusters (141). In one embodiment, columns of the reward matrix (145) identify the treatments that may be used and rows of the reward matrix (145) may identify the user segments (also referred to as customer segments) that correspond to the segment clusters (141). The values in the reward matrix (145) identify the action probabilities for users in a user segment for a treatment.
[0025] The optimization engine (147) generates the allocation vectors (153) from the reward matrix (145) and the treatment constraints (150). In one embodiment, the optimization engine (147) uses a linear programming algorithm to apply the treatment constraints (150) to the reward matrix (145) to generate the allocation vectors (153).
[0026] The treatment constraints (150), which include the treatment constraint (149), are constraints on usage of the treatments by the system. As an example, a treatment constraint may identify that a particular treatment may be used for fixed percentage of the number of purchases made with the system (100) (shown in Figure ID). Each treatment may have multiple constraints. [0027] The allocation vectors (153) include the allocation vector (152). The allocation vector (152) includes an element for each of the treatments offered by the system. An allocation vector represents an optimal percentage allocation of treatments for a segment (which is linked to a segment cluster). An optimal percentage allocation achieves metric projections that are higher than metric projections for a nonoptimal percentage allocation. As an example, if a first segment (identified with a first segment cluster) has an optimal allocation vector of “0.25, 0.25, 0.50” (for first, second, and third treatments identified with first, second, and third treatment identifiers respectively), then this result from the optimization engine (147) ( i.e ., the optimal allocation vector) indicates that in order to optimize a metric (e.g. , conversion), the first treatment and the second treatment are each allocated to 25% of the population for the first segment and, to the remaining 50% of the population, the third treatment is allocated.
[0028] Turning to Figure IB, the server application (102) uses a constrained contextual bandit algorithm to identify treatments using the treatments selector (181). The treatment selector (181) includes the random selector (182) and the weighted selector (185). In response to the user feature set (180), either the random selector (182) or the weighted selector (185) is chosen to determine the treatment identifier (188). A policy threshold (also referred to as e) is used to determine whether the random selector (182) or the weighted selector (185) is used. A random number is selected and if the random number is less than the policy threshold, then the random selector (182) is utilized. Otherwise, the weighted selector (185) is utilized.
[0029] The random selector (182) selects the treatment identifier (188) using a random policy. The treatment identifier for each treatment has an equal chance to be randomly selected as the treatment identifier (188) that is returned by the treatment selector (181).
[0030] The weighted selector (185) uses a weighted policy to select the treatment identifier (188) from the plurality of treatment identifiers corresponding to the treatments that may be used by the system (100) (shown in Figure ID). The weighted selector (185) uses the current model (154) to select the treatment identifier (188). The current model (154) includes the segment clusters (141) and the allocation vectors (153). The weighted selector (185) identifies the segment cluster (140) as corresponding to the user feature set (180) and identifies the allocation vector (152) as corresponding to the segment cluster (140). The weighted selector (185) uses the elements in the allocation vector (152) as the weights for the random selection of the treatment identifier (188) from the plurality of treatment identifiers. For example, when a first element of the allocation vector (152) is 0.5, then the treatment identifier corresponding to that first element has a 50% chance of being selected. A random number is selected and the element of the allocation vector (152) corresponding to the random number is identified to select the treatment identifier (188). For example, when a first element of the allocation vector (152) is 0.5 (indicating a 50% chance of being selected), a second element is 0.3, a third element is 0.2, and the random number is between 0.5 and 0.8 ( i.e ., 0.5 + 0.3), then the treatment identifier that corresponds to the second element is selected as the treatment identifier (188).
[0031] Turning to Figure 1C, the model update application (104) updates the models used by the system (100) (shown in Figure ID). The model update application (104) includes the model generator (160), the model tester (167), and the model selector (172).
[0032] The model generator (160) generates the test models (163), which include the test model (162). The model generator (160) uses different hyperparameters to generate the different test models (163). One of the hyperparameters may be the number of segments used by the segmentation model (138) (shown in Figure 1 A). As an example, after identifying a set of hyperparameters for the test model (162), the model generator (160) uses the model training application (103) (shown in Figure 1 A) to train the test model (162).
[0033] The current model (165) is the model presently being used by the system (100) (shown in Figure ID) to handle live requests received by the server application (102) (shown in Figure IB). The current model (165) is included with the test models (163) to be tested with the model tester (167).
[0034] The model tester (167) generates the metric projections (170) from the test models (163) and the current model (165). For each of the models being tested, the model tester (167) plays back the interaction data (109) (shown in Figure ID) to generate a projection of the number of conversions using a particular model.
[0035] The metric projections (170) include the metric projection (169). The metric projection (169) may be a projection of the number of actions generated during a playback of the historical data of interaction with the website by the users of the website using either one of the test models (163) or the current model (165). For example, a metric may be the conversion rate that identifies how often a web session is converted into a sale and the metric projection may be a projection of the rate or amount of conversions using the treatments available to the system.
[0036] The model selector (172) selects the subsequent model (174) from the group of the test models (163) and the current model (165). In one embodiment, the model selector (172) selects the subsequent model (174), which corresponds to the metric projection (169), which has the highest value ( e.g ., the most conversions) from the values of the metric projections (170).
[0037] The subsequent model (174) is used to replace the current model (165) for processing live requests by the server application (102) (shown in Figure ID) the subsequent model (174) may be any one of the test models (163) and the current model (165).
[0038] Turning to Figure ID, the system (100) is trained to use constrained contextual bandit reinforcement learning to select treatments presented to users. The system (100) includes the server (101), the repository (108), the developer device (116), and the user device (118). The server (101) may include the server application (102), the model training application (103), and the model update application (104).
[0039] The server application (102) is a program on the server (101). The server application (102) includes multiple programs used by the system (100) to interact with the user device (118), select treatment identifiers, and present treatments corresponding to the treatment identifiers to the user device (118).
[0040] The model training application (103) is a program on the server (101). The model training application (103) trains the machine learning models as further described in Figure 1A. The model training application (103) may be operated or controlled by the developer device (116) with a developer application (112).
[0041] The model update application (104) is a program on the server (101). The model update application (104) updates the machine learning models used by the server application (102) to select treatment identifiers. The model update application (104) may be operated or controlled by the developer device (116) with a developer application (117).
[0042] The server (101) is an embodiment of the computing system (400) and the nodes (422 and 424) of Figure 4 A and Figure 4B. The server (101) may be one of a set of virtual machines hosted by a cloud services provider to deploy the model training application (103) and the server application (102). Each of the programs running on the server (101) may execute inside one or more containers hosted by the server (101).
[0043] The repository (108) is a computing system that may include multiple computing devices in accordance with the computing system (400) and the nodes (422 and 424) described below in Figures 4 A and 4B. The repository (108) may be hosted by a cloud services provider. The cloud services provider may provide hosting, virtualization, and data storage services as well as other cloud services and to operate and control the data, programs, and applications that store and retrieve data from the repository (108). The data in the repository (108) may include the interaction data (109), the treatment data (110), the machine learning model data (111), and the training data (112).
[0044] The interaction data (109) is the data recorded by the system (100) as a user interacts with the system (100) through the user application (119) and the server application (102). The interaction data (109) includes data about a web session established between the user application (119) and the server application (102) and may include interaction data from the user application (119). In one example, the user application (119) is a browser that sends interaction data identifying the mouse clicks, keyboard commands, and other browser events generated by the user application (119) in response to inputs from a user. The interaction data (109) forms the basis for the user feature sets that are used to train new models and to process live requests with the server application (102) .
[0045] The treatment data (110) includes the code and data for the treatments used by the system (100). The treatment data (110) also includes the treatment identifiers that uniquely identify the different treatments used by the system (100). The treatments available to the system include specific treatments (a discount, free shipping, extra reward points, etc) and no treatment (i.e., no discount or other benefit is provided). In one embodiment, treatments may be added to or removed from the system using the developer device (116).
[0046] The machine learning model data (111) includes the code and data for the machine learning models used by the system (100). The machine learning model data (111) includes the segment clusters (141) (shown in Figure 1 A) and allocation vectors (153) (shown in Figure 1A) used by the server application (102) to select treatment identifiers from user feature sets.
[0047] The training data (112) includes the data used to train the machine learning models. The training data (112) may include the intermediate data generated to train and update the models ( e.g ., the data shown in Figure IB and in Figure 1C). [0048] The data in the repository (108) may also include a web page (113) that is part of a website hosted by the system (100) with which the users and the developers interact using the user device (118) and the developer device (116) to access the server application (102), the model training application (103), and the model update application (104).
[0049] The developer device (116) is an embodiment of the computing system (400) and the nodes (422 and 424) of Figure 4 A and Figure 4B. In one embodiment, the developer device (116) is a desktop personal computer (PC). The developer device (116) includes the developer application (117) for accessing the model training application (103) and the model update application (104). The developer application (117) may include a graphical user interface for interacting with the model training application (103) and the model update application (104) to control training and updating the machine learning models of the system (100).
[0050] The user device (118) is an embodiment of the computing system (400) and the nodes (422 and 424) of Figure 4 A and Figure 4B. In one embodiment, the user device (118) is a desktop personal computer (PC), a smartphone, a tablet, etc. The user device (118) is used to access the web page (113) of the website hosted by the system (100). The user device (118) includes the user application (119) for accessing the server application (102). The user application (119) may include multiple interfaces (graphical user interfaces, application program interfaces (APIs)) for interacting with the server application (102). A user may operate the user application (119) to perform tasks with the server application (102) to interact with the system (100). The results may be presented by being displayed by the user device (118) in the user application (119).
[0051] The developer application (117) and the user application (119) may be web browsers that access the server application (102), the model training application (103), and the model update application (104) using web pages hosted by the server (101). The developer application (117) and the user application (119) may additionally be web services that communicate with the server application (102), the model training application (103), and the model update application (104) using representational state transfer application programming interfaces (RESTful APIs). Although Figure ID shows a client server architecture, one or more parts of the model training application (103) and the server application (102) may be local applications on the developer device (116) and the user device (118) without departing from the scope of the disclosure.
[0052] Figures 2A, 2B, 2C, and 2D show flowcharts of processes in accordance with the disclosure. The process (200) of Figure 2 A uses constrained contextual bandit reinforcement learning to select treatments. The process (220) of Figure 2B trains machine learning models using contextual bandit reinforcement learning. The process (260) of Figure 2C selects treatment identifiers using constrained contextual bandit reinforcement learning. The process (280) of Figure 2D updates machine learning models trained with constrained contextual bandit reinforcement learning. The embodiments of Figures 2A and 2B may be combined and may include or be included within the features and embodiments described in the other figures of the application. The features of Figures 2A, 2B, 2C, and 2D are, individually and as an ordered combination, improvements to the technology of computing systems and machine learning systems. While the various steps in the flowcharts are presented and described sequentially, one of ordinary skill will appreciate that at least some of the steps may be executed in different orders, may be combined or omitted, and at least some of the steps may be executed in parallel. Furthermore, the steps may be performed actively or passively. For example, some steps may be performed using polling or be interrupt driven. By way of an example, determination steps may not have a processor process an instruction unless an interrupt is received to signify that condition exists. As another example, determinations may be performed by performing a test, such as checking a data value to test whether the value is consistent with the tested condition. [0053] Turning to Figure 2A, the process (200) may execute on a server to select treatments using a constrained contextual bandit and reinforcement learning. At Step 202, a request is received the request may be received from a client device by a server and processed by the server. As an example, the request may be for a product offered through a website hosted by the server for which the server will identify a treatment to include with a web page offering the product.
[0054] At Step 204, a user feature set is generated using the request. The user feature set is a set of features extracted from interaction data, recorded by the server, that include and identify metrics and user inputs from the interaction between the client device and the server.
[0055] At Step 206, a treatment identifier is selected by a server application using a segment cluster corresponding to the user feature set and an allocation vector corresponding to the segment cluster. The treatment identifier may be selected with a random selector that uses a random policy and with a weighted selector that uses a weighted policy.
[0056] Prior to selection of the treatment identifier, the machine learning model is trained to generate the plurality of segment clusters and the plurality of allocation vectors from training data. The segment cluster is one of a plurality of segment clusters from a machine learning model used by the system. The segment clusters are generated with a model training application. The allocation vector is one of a plurality of allocation vectors generated from a reward matrix and a plurality of treatment constraints using an optimization engine. The treatment constraints are applied to the reward matrix using a linear programming algorithm.
[0057] At Step 208, the treatment identifier is presented. In one embodiment, presentation of the treatment identifier may include generating a web page that includes content of the treatment specified by the treatment identifier, transmitting the webpage from this server to the client device, and rendering the webpage on the client device. [0058] Turning to Figure 2B, the process (220) may execute on a server to train machine learning models using constrained contextual bandit reinforcement learning. At Step 222, a plurality of action probabilities are generated from a plurality of treatment identifiers and a plurality of user feature sets using a propensity model. In one embodiment, the action probabilities are generated by averaging the number of conversions of a treatment for a given set of user features (also referred to as a user feature set).
[0059] At Step 224, a plurality of feature vectors are generated from the plurality of user feature sets and the plurality of action probabilities using a feature vector generator. The feature vector generator may generate a user feature set with a vector of treatment action probabilities that correspond to the user feature set to form a feature vector.
[0060] At Step 226, training a segmentation model to generate the plurality of segment clusters from the plurality of vectors. The segmentation model may use a clustering algorithm (K means clustering, Gaussian mixture model, etc) to generate these segment clusters from the feature vectors generated from the user feature sets and action probabilities by the feature vector generator.
[0061] At Step 228, a reward matrix is generated from the plurality of feature vectors and the plurality of segment clusters using a matrix generator. The rows of the reward matrix correspond to the second clusters with a one to one correspondence and the columns of the reward matrix correspond to the treatments available to the system in a one to one correspondence. The treatments available to the system are enumerated in the columns and include specific treatments (a discount, free shipping, extra reward points, etc) and no treatment (. i.e ., no discount or other benefit is provided).
[0062] At Step 230, the plurality of allocation vectors corresponding to the plurality of segment clusters is generated from the reward matrix and a plurality of treatment constraints using an optimization engine. The allocation vectors may correspond to the segment clusters with a one to one correspondence. The optimization engine may treat the values in a reward matrix as coefficients of a linear programming problem and iteratively solve for a set of coefficients ( i.e ., values of the reward matrix) that maximize an output variable ( e.g ., an expected action, a conversion, a conversion rate, etc.). The optimization engine may use basis exchange algorithms, interior point algorithms, etc. to perform the optimization.
[0063] Turning to Figure 2C, the process (260) may execute on a server to select a treatment identifier using constrained contextual bandit reinforcement learning. At Step 262, in response to the system receiving a request, one of a random selector and a weighted selector is selected. The random selector and the weighted selector are used to select the treatment identifier using a treatment selector. An epsilon greedy algorithm may be used to determine which of the random selector or the weighted selector will be used to select the treatment identifier. For example, when a random number (e.g., a real number ranging from 0 to 1) is less than a value for epsilon (e.g, 0.1), then the random selector may be used and otherwise the weighted selector may be used. As the system matures, i.e., after having collected more and more interaction data, the value of epsilon may be lowered from an initial value (e.g, 0.5).
[0064] At Step 264, in response to selecting the weighted selector, the treatment identifier is selected using the segment cluster and the allocation vector. The weighted selector uses a weighted policy defined by the allocation vector. The allocation vector identifies a plurality of weights used to randomly select the treatment identifier. For example, when a first element of an allocation vector is 0.3 (indicating a 30% chance of being selected), a second element is 0.6, a third element is 0.1, and the random number is between 0.9 (i.e., 0.3 + 0.6) and 1 (i.e., 0.3 + 0.6 + 0.1), then the treatment identifier that corresponds to the third element is selected as the treatment identifier.
[0065] At Step 266, in response to selecting the random selector, the treatment identifier is selected randomly from a plurality of treatment identifiers. For example, when three treatments are available, a random integer between 1 and 3 may be generated with a random number generator. The treatment identifier associated with the value of the random number may be selected as the treatment identifier used by the system.
[0066] Turning to Figure 2D, the process (280) may execute on a server to update machine learning models trained with constrained contextual bandit reinforcement learning. At Step 282, a plurality of test models are generated. The different models may be generated using different algorithms and different hyperparameters. As an example, some models may be generated using a K means clustering algorithm and other models may be generated using a Gaussian mixture model algorithm. Different models may be generated with different numbers of segments. For example one model may have five user segments and another model may have seven user segments.
[0067] At Step 284, a plurality of metric projections are generated using a model tester, the plurality of test models, and a current model. The plurality of metric projections are mapped to a group comprising the current model and the plurality of test models in a one to one correspondence. The model tester plays back historical interaction data through the models being tested to generate a metric projection for each model being tested.
[0068] At Step 286, a subsequent model is selected from the group comprising the current model and the plurality of test models using the plurality of metric projections and a model selector. The model corresponding to the metric projection in accordance with a criterion is selected. In one example, the criterion may be to identify the metric projection with the highest number of conversions. The subsequent model replaces the current model as the model being used by the system to process requests.
[0069] The subsequent model is selected as part of a periodic process. For example, the system may engage the process (280) daily, weekly, monthly, etc., to continuously refine and improve the current model being used by the system to handle live requests. [0070] The plurality of allocation vectors of the current model may also be updated periodically and prior to selecting a subsequent model to replace the machine learning model. For example, subsequent models may be selected on a weekly basis and the current model may be updated on a daily basis. The update may include recalculating the reward matrix and allocation vectors using the new interaction data collected since the last update to the current model.
[0071] Figures 3 A and 3B show examples of interfaces and web pages in accordance with the disclosure. Figure 3A shows a user interface to manage treatments selected using constrained contextual bandit reinforcement learning. Figures 3B shows interaction with users of the system. The embodiments of Figures 3 A and 3B may be combined and may include or be included within the features and embodiments described in the other figures of the application. The features and elements of Figures 3 A and 3B are, individually and as a combination, improvements to the technology of computing systems and machine learning systems. The various features, elements, widgets, components, and interfaces shown in Figures 3 A and 3B may be omitted, repeated, combined, and/or altered as shown. Accordingly, the scope of the present disclosure should not be considered limited to the specific arrangements shown in Figures 3 A and 3B.
[0072] Turning to Figure 3A, the user interface (300) includes several elements (also referred to as user interface elements)that may be displayed on a developer device in a developer application to control the treatments and models used by the system.
[0073] The Elements (302) include switches for identifying which treatments are available to be used with any category of products. The different treatments may be coded with colors that correspond to the colors used in the elements (310), (330), and (332).
[0074] The elements (304) include check boxes used to identify the usage of treatments with product categories. A checkbox for each category is provided that determines whether or not the category is allowed to use treatments. Each category may be expanded to show additional options and allow specific treatments to be selected for use with that category.
[0075] The column of elements (310) is a list of the treatments that are available to the currently selected category. The currently selected category may be selected with the elements (332), which are further discussed below. Four treatments are available “10% discount”, “free shipping”, “bonus reward points”, and “not treated”.
[0076] The column of elements (312) lists the treatment rates for the treatments identified in the column of elements (310). The “10% discount” treatment may be used for 10% of impressions, i.e ., for 10% of the page displays to users using the website. The “free shipping” treatment may be used for 11% of impressions. The “bonus reward points” treatment may be used for 5% of impressions. The “not treated” treatment may be used for 74% of impressions.
[0077] The column of elements (314) lists the expected conversion rate for the treatments listed in the column of elements (310). As an example, the “10% discount” treatment is projected to have a conversion rate of 12%.
[0078] The element (322) identifies the predicted increase in conversions using the treatment schedule defined with the user interface (300), i.e., when applying the treatments specified with a user interface (300) to the pages transmitted to the users of the system. For example, the element (322) indicates that the number of conversions is predicted to increase by 10% as compared to when no treatments are being provided by the system.
[0079] The element (324) identifies the projected lift. In one embodiment, the projected lift identifies an increase in revenue from using the treatment schedule defined with the user interface (300).
[0080] The element (330) is a chart that visually depicts the treatments to be used in response to page requests received by the system. Different colors in the chart correspond to the colors from the column of elements (310) and the elements (302).
[0081] The element (332) includes a plurality of icons. Selecting one of the icons from the element (332) updates the element (330) and the columns of elements (310), (312), and (314) to show corresponding information for the different categories of products.
[0082] Turning to Figure 3B, the user (360) uses the browser A (362) to access the application (352) to purchase an airline ticket. The browser A (362) sends a request to the application (352) for an offer for an airline ticket. The application (352) has been configured by a developer to offer multiple treatments that include a 10% discount and a null treatment ( i.e ., no discount). The model (356) has been previously trained to generate segmentation clusters and allocation vectors that are used to respond to requests from the browser A (362) and the browser B (372). The allocation vectors of the model (356) include constraints for the available treatments with the discount treatment being offered on 30% of impressions and the null treatment being offered on 70% of impressions.
[0083] In response to the request from the browser A (362), the application (352) generates a customer feature set from the interaction data from the repository (354) that corresponds to the web session with the user (360). The application (352) uses the model (356) to select a treatment to provide with the offer to purchase an airline ticket through a web page. The model (356) determines whether to use a random selector or a weighted selector based on the value of a random number compared to a threshold (epsilon). The random number is greater than the threshold and the weighted selector is used. The weighted selector identifies the segment cluster closest to the customer feature set generated in response to receiving the request from the browser A (362). The model (356) then identifies the allocation vector associated with the segment cluster. The elements of the allocation vector are used as weights for randomly selecting one of the available treatments by the system and the first treatment (a 10% discount) is selected. The application (352) generates the web page A (364) that includes the treatment (366) and transmits the web page A (364) to the browser A (362).
[0084] The browser A (362) receives and displays the web page A (364). If the user (360) purchases the ticket (by clicking the select button), the airline ticket will be purchased with the 10% discount from the treatment.
[0085] A different user, the user (370), also accesses the application (352) and uses the browser B (372). The user (370) may request the same type of ticket (the same departure and destination cities, the same times, the same airline, etc).
[0086] In response to the request from the browser B (372), the application (352) generates the web page B (374). The model (356) is again used by the application (352) to select a treatment.
[0087] The treatment selected for the user (370) is the null treatment in which no treatment is provided ( i.e ., no discount). The web page B (374) does not include the treatment that was included in the webpage A (364). In this case, the random number compared to the threshold is less than the threshold, a pure random selection of the available treatments is made, and the null treatment is selected.
[0088] Embodiments of the invention may be implemented on a computing system specifically designed to achieve an improved technological result. When implemented in a computing system, the features and elements of the disclosure provide a significant technological advancement over computing systems that do not implement the features and elements of the disclosure. Any combination of mobile, desktop, server, router, switch, embedded device, or other types of hardware may be improved by including the features and elements described in the disclosure. For example, as shown in FIG. 4 A, the computing system (400) may include one or more computer processors (402), non-persistent storage (404) ( e.g ., volatile memory, such as random access memory (RAM), cache memory), persistent storage (406) (e.g., a hard disk, an optical drive such as a compact disk (CD) drive or digital versatile disk (DVD) drive, a flash memory, etc), a communication interface (412) (e.g, Bluetooth interface, infrared interface, network interface, optical interface, etc), and numerous other elements and functionalities that implement the features and elements of the disclosure.
[0089] The computer processor(s) (402) may be an integrated circuit for processing instructions. For example, the computer processor(s) may be one or more cores or micro-cores of a processor. The computing system (400) may also include one or more input devices (410), such as a touchscreen, keyboard, mouse, microphone, touchpad, electronic pen, or any other type of input device.
[0090] The communication interface (412) may include an integrated circuit for connecting the computing system (400) to a network (not shown) ( e.g ., a local area network (LAN), a wide area network (WAN) such as the Internet, mobile network, or any other type of network) and/or to another device, such as another computing device.
[0091] Further, the computing system (400) may include one or more output devices (408), such as a screen (e.g., a liquid crystal display (LCD), a plasma display, touchscreen, cathode ray tube (CRT) monitor, projector, or other display device), a printer, external storage, or any other output device. One or more of the output devices may be the same or different from the input device(s). The input and output device(s) may be locally or remotely connected to the computer processor(s) (402), non-persistent storage (404) , and persistent storage (406). Many different types of computing systems exist, and the aforementioned input and output device(s) may take other forms.
[0092] Software instructions in the form of computer readable program code to perform embodiments of the invention may be stored, in whole or in part, temporarily or permanently, on a non-transitory computer readable medium such as a CD, DVD, storage device, a diskette, a tape, flash memory, physical memory, or any other computer readable storage medium. Specifically, the software instructions may correspond to computer readable program code that, when executed by a processor(s), is configured to perform one or more embodiments of the invention. [0093] The computing system (400) in FIG. 4A may be connected to or be a part of a network. For example, as shown in FIG. 4B, the network (420) may include multiple nodes ( e.g ., node X (422), node Y (424)). Each node may correspond to a computing system, such as the computing system shown in FIG. 4A, or a group of nodes combined may correspond to the computing system shown in FIG. 4A. By way of an example, embodiments of the invention may be implemented on a node of a distributed system that is connected to other nodes. By way of another example, embodiments of the invention may be implemented on a distributed computing system having multiple nodes, where each portion of the invention may be located on a different node within the distributed computing system. Further, one or more elements of the aforementioned computing system (400) may be located at a remote location and connected to the other elements over a network.
[0094] Although not shown in FIG. 4B, the node may correspond to a blade in a server chassis that is connected to other nodes via a backplane. By way of another example, the node may correspond to a server in a data center. By way of another example, the node may correspond to a computer processor or micro-core of a computer processor with shared memory and/or resources.
[0095] The nodes (e.g., node X (422), node Y (424)) in the network (420) may be configured to provide services for a client device (426). For example, the nodes may be part of a cloud computing system. The nodes may include functionality to receive requests from the client device (426) and transmit responses to the client device (426). The client device (426) may be a computing system, such as the computing system shown in FIG. 4A. Further, the client device (426) may include and/or perform all or a portion of one or more embodiments of the invention.
[0096] The computing system or group of computing systems described in FIG. 4A and 4B may include functionality to perform a variety of operations disclosed herein. For example, the computing system(s) may perform communication between processes on the same or different system. A variety of mechanisms, employing some form of active or passive communication, may facilitate the exchange of data between processes on the same device. Examples representative of these inter-process communications include, but are not limited to, the implementation of a file, a signal, a socket, a message queue, a pipeline, a semaphore, shared memory, message passing, and a memory-mapped file. Further details pertaining to a couple of these non-limiting examples are provided below.
[0097] Based on the client-server networking model, sockets may serve as interfaces or communication channel end-points enabling bidirectional data transfer between processes on the same device. Foremost, following the client- server networking model, a server process ( e.g ., a process that provides data) may create a first socket object. Next, the server process binds the first socket object, thereby associating the first socket object with a unique name and/or address. After creating and binding the first socket object, the server process then waits and listens for incoming connection requests from one or more client processes (e.g., processes that seek data). At this point, when a client process wishes to obtain data from a server process, the client process starts by creating a second socket object. The client process then proceeds to generate a connection request that includes at least the second socket object and the unique name and/or address associated with the first socket object. The client process then transmits the connection request to the server process. Depending on availability, the server process may accept the connection request, establishing a communication channel with the client process, or the server process, busy in handling other operations, may queue the connection request in a buffer until server process is ready. An established connection informs the client process that communications may commence. In response, the client process may generate a data request specifying the data that the client process wishes to obtain. The data request is subsequently transmitted to the server process. Upon receiving the data request, the server process analyzes the request and gathers the requested data. Finally, the server process then generates a reply including at least the requested data and transmits the reply to the client process. The data may be transferred, more commonly, as datagrams or a stream of characters ( e.g ., bytes).
[0098] Shared memory refers to the allocation of virtual memory space in order to substantiate a mechanism for which data may be communicated and/or accessed by multiple processes. In implementing shared memory, an initializing process first creates a shareable segment in persistent or non-persistent storage. Post creation, the initializing process then mounts the shareable segment, subsequently mapping the shareable segment into the address space associated with the initializing process. Following the mounting, the initializing process proceeds to identify and grant access permission to one or more authorized processes that may also write and read data to and from the shareable segment. Changes made to the data in the shareable segment by one process may immediately affect other processes, which are also linked to the shareable segment. Further, when one of the authorized processes accesses the shareable segment, the shareable segment maps to the address space of that authorized process. Often, only one authorized process may mount the shareable segment, other than the initializing process, at any given time.
[0099] Other techniques may be used to share data, such as the various data described in the present application, between processes without departing from the scope of the invention. The processes may be part of the same or different application and may execute on the same or different computing system.
[00100] Rather than or in addition to sharing data between processes, the computing system performing one or more embodiments of the invention may include functionality to receive data from a user. For example, in one or more embodiments, a user may submit data via a graphical user interface (GUI) on the user device. Data may be submitted via the graphical user interface by a user selecting one or more graphical user interface widgets or inserting text and other data into graphical user interface widgets using a touchpad, a keyboard, a mouse, or any other input device. In response to selecting a particular item, information regarding the particular item may be obtained from persistent or non-persistent storage by the computer processor. Upon selection of the item by the user, the contents of the obtained data regarding the particular item may be displayed on the user device in response to the user's selection.
[00101] By way of another example, a request to obtain data regarding the particular item may be sent to a server operatively connected to the user device through a network. For example, the user may select a uniform resource locator (URL) link within a web client of the user device, thereby initiating a Hypertext Transfer Protocol (HTTP) or other protocol request being sent to the network host associated with the URU. In response to the request, the server may extract the data regarding the particular selected item and send the data to the device that initiated the request. Once the user device has received the data regarding the particular item, the contents of the received data regarding the particular item may be displayed on the user device in response to the user's selection. Further to the above example, the data received from the server after selecting the URU link may provide a web page in Hyper Text Markup Language (HTML) that may be rendered by the web client and displayed on the user device.
[00102] Once data is obtained, such as by using techniques described above or from storage, the computing system, in performing one or more embodiments of the invention, may extract one or more data items from the obtained data. For example, the extraction may be performed as follows by the computing system in FIG. 4A. First, the organizing pattern ( e.g ., grammar, schema, layout) of the data is determined, which may be based on one or more of the following: position (e.g., bit or column position, Nth token in a data stream, etc), attribute (where the attribute is associated with one or more values), or a hierarchical/tree structure (consisting of layers of nodes at different levels of detail-such as in nested packet headers or nested document sections). Then, the raw, unprocessed stream of data symbols is parsed, in the context of the organizing pattern, into a stream (or layered structure) of tokens (where each token may have an associated token "type"). [00103] Next, extraction criteria are used to extract one or more data items from the token stream or structure, where the extraction criteria are processed according to the organizing pattern to extract one or more tokens (or nodes from a layered structure). For position-based data, the token(s) at the position(s) identified by the extraction criteria are extracted. For attribute/value-based data, the token(s) and/or node(s) associated with the attribute(s) satisfying the extraction criteria are extracted. For hierarchical/layered data, the token(s) associated with the node(s) matching the extraction criteria are extracted. The extraction criteria may be as simple as an identifier string or may be a query presented to a structured data repository (where the data repository may be organized according to a database schema or data format, such as XML).
[00104] The extracted data may be used for further processing by the computing system. For example, the computing system of FIG. 4A, while performing one or more embodiments of the invention, may perform data comparison. Data comparison may be used to compare two or more data values ( e.g ., A, B). For example, one or more embodiments may determine whether A > B, A = B, A != B, A < B, etc. The comparison may be performed by submitting A, B, and an opcode specifying an operation related to the comparison into an arithmetic logic unit (ALU) ( i.e ., circuitry that performs arithmetic and/or bitwise logical operations on the two data values). The ALU outputs the numerical result of the operation and/or one or more status flags related to the numerical result. For example, the status flags may indicate whether the numerical result is a positive number, a negative number, zero, etc. By selecting the proper opcode and then reading the numerical results and/or status flags, the comparison may be executed. For example, in order to determine if A > B, B may be subtracted from A (i.e., A - B), and the status flags may be read to determine if the result is positive (i.e., if A > B, then A - B > 0). In one or more embodiments, B may be considered a threshold, and A is deemed to satisfy the threshold if A = B or if A > B, as determined using the ALU. In one or more embodiments of the invention, A and B may be vectors, and comparing A with B requires comparing the first element of vector A with the first element of vector B, the second element of vector A with the second element of vector B, etc. In one or more embodiments, if A and B are strings, the binary values of the strings may be compared.
[00105] The computing system in FIG. 4A may implement and/or be connected to a data repository. For example, one type of data repository is a database. A database is a collection of information configured for ease of data retrieval, modification, re-organization, and deletion. Database Management System (DBMS) is a software application that provides an interface for users to define, create, query, update, or administer databases.
[00106] The user, or software application, may submit a statement or query into the DBMS. Then the DBMS interprets the statement. The statement may be a select statement to request information, update statement, create statement, delete statement, etc. Moreover, the statement may include parameters that specify data, data containers (database, table, record, column, view, etc), identifiers, conditions (comparison operators), functions (e.g. join, full join, count, average, etc), sorts (e.g. ascending, descending), or others. The DBMS may execute the statement. For example, the DBMS may access a memory buffer, a reference or index a file for read, write, deletion, or any combination thereof, for responding to the statement. The DBMS may load the data from persistent or non-persistent storage and perform computations to respond to the query. The DBMS may return the result(s) to the user or software application.
[00107] The computing system of FIG. 4A may include functionality to present raw and/or processed data, such as results of comparisons and other processing. For example, presenting data may be accomplished through various presenting methods. Specifically, data may be presented through a user interface provided by a computing device. The user interface may include a GUI that displays information on a display device, such as a computer monitor or a touchscreen on a handheld computer device. The GUI may include various GUI widgets that organize what data is shown as well as how data is presented to a user. Furthermore, the GUI may present data directly to the user, e.g., data presented as actual data values through text, or rendered by the computing device into a visual representation of the data, such as through visualizing a data model.
[00108] For example, a GUI may first obtain a notification from a software application requesting that a particular data object be presented within the GUI. Next, the GUI may determine a data object type associated with the particular data object, e.g, by obtaining data from a data attribute within the data object that identifies the data object type. Then, the GUI may determine any rules designated for displaying that data object type, e.g ., rules specified by a software framework for a data object class or according to any local parameters defined by the GUI for presenting that data object type. Finally, the GUI may obtain data values from the particular data object and render a visual representation of the data values within a display device according to the designated rules for that data object type.
[00109] Data may also be presented through various audio methods. In particular, data may be rendered into an audio format and presented as sound through one or more speakers operably connected to a computing device.
[00110] Data may also be presented to a user through haptic methods. For example, haptic methods may include vibrations or other physical signals generated by the computing system. For example, data may be presented to a user using a vibration generated by a handheld computer device with a predefined duration and intensity of the vibration to communicate the data.
[00111] The above description of functions presents only a few examples of functions performed by the computing system of FIG. 4A and the nodes and / or client device in FIG. 4B. Other functions may be performed using one or more embodiments of the invention.
[00112] While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.

Claims

CLAIMS What is claimed is:
1. A method comprising: receiving a request; generating a user feature set using the request; selecting, by a server application, a treatment identifier using a segment cluster, of a plurality of segment clusters from a machine learning model, corresponding to the user feature set and an allocation vector, of a plurality of allocation vectors generated from a reward matrix and a plurality of treatment constraints using an optimization engine, corresponding to the segment cluster; and presenting the treatment identifier.
2. The method of claim 1, further comprising: training the machine learning model to generate the plurality of segment clusters and the plurality of allocation vectors.
3. The method of claim 2, wherein the training further comprises: generating a plurality of action probabilities from a plurality of treatment identifiers and a plurality of user feature sets using a propensity model; generating a plurality of feature vectors from the plurality of user feature sets and the plurality of action probabilities using a feature vector generator; training a segmentation model to generate the plurality of segment clusters from the plurality of feature vectors; generating a reward matrix from the plurality of feature vectors and the plurality of segment clusters using a matrix generator; and generating the plurality of allocation vectors corresponding to the plurality of segment clusters from the reward matrix and a plurality of treatment constraints using an optimization engine.
4. The method of claim 1, wherein selecting the treatment identifier further comprises: selecting one of a random selector and a weighted selector to select the treatment identifier using a treatment selector in response to receiving the request; in response to selecting the weighted selector, selecting the treatment identifier using the segment cluster and the allocation vector , wherein the allocation vector identifies a plurality of weights used to randomly select the treatment identifier; and in response to selecting the random selector, selecting the treatment identifier randomly from a plurality of treatment identifiers.
5. The method of claim 1, further comprising: generating a plurality of test models; generating a plurality of metric projections using a model tester, the plurality of test models, and a current model, wherein the plurality of metric projections are mapped to a group comprising the current model and the plurality of test models in a one to one correspondence; and selecting a subsequent model, to use as the machine learning model, from the group comprising the current model and the plurality of test models using the plurality of metric projections and a model selector.
6. The method of claim 5, wherein the subsequent model is selected as part of a periodic process.
7. The method of claim 1, further comprising: updating the plurality of allocation vectors periodically and prior to selecting a subsequent model to replace the machine learning model.
8. A system comprising: a server comprising one or more processors and one or more memories; and a server application, executing on one or more processors of the server, configured for: receiving a request; generating a user feature set using the request; selecting, by the server application, a treatment identifier using a segment cluster, of a plurality of segment clusters of a machine learning model, corresponding to the user feature set and an allocation vector, of a plurality of allocation vectors generated from a reward matrix and a plurality of treatment constraints using an optimization engine, corresponding to the segment cluster; and presenting the treatment identifier.
9. The system of claim 8, wherein the server application is further configured for: training the machine learning model to generate the plurality of segment clusters and the plurality of allocation vectors.
10. The system of claim 9, wherein the training further comprises: generating a plurality of action probabilities from a plurality of treatment identifiers and a plurality of user feature sets using a propensity model; generating a plurality of feature vectors from the plurality of user feature sets and the plurality of action probabilities using a feature vector generator; training a segmentation model to generate the plurality of segment clusters from the plurality of feature vectors; generating a reward matrix from the plurality of feature vectors and the plurality of segment clusters using a matrix generator; and generating the plurality of allocation vectors corresponding to the plurality of segment clusters from the reward matrix and a plurality of treatment constraints using an optimization engine.
11. The system of claim 8, wherein selecting the treatment identifier further comprises: selecting one of a random selector and a weighted selector to select the treatment identifier using a treatment selector in response to receiving the request; in response to selecting the weighted selector, selecting the treatment identifier using the segment cluster and the allocation vector , wherein the allocation vector identifies a plurality of weights used to randomly select the treatment identifier; and in response to selecting the random selector, selecting the treatment identifier randomly from a plurality of treatment identifiers.
12. The system of claim 8, wherein the server application is further configured for: generating a plurality of test models; generating a plurality of metric projections using a model tester, the plurality of test models, and a current model, wherein the plurality of metric projections are mapped to a group comprising the current model and the plurality of test models in a one to one correspondence; and selecting a subsequent model, to use as the machine learning model, from the group comprising the current model and the plurality of test models using the plurality of metric projections and a model selector.
13. The system of claim 12, wherein the subsequent model is selected as part of a periodic process.
14. The system of claim 8, wherein the server application is further configured for: updating the plurality of allocation vectors periodically and prior to selecting a subsequent model to replace the machine learning model.
15. A method comprising: training a machine learning model to generate a plurality of segment clusters and a plurality of allocation vectors generating a user feature set using a request; selecting, by a server application, a treatment identifier using a segment cluster, of a plurality of segment clusters of a machine learning model, corresponding to the user feature set and an allocation vector, of a plurality of allocation vectors generated from a reward matrix and a plurality of treatment constraints using an optimization engine, corresponding to the segment cluster; and presenting the treatment identifier.
16. The method of claim 15, wherein the training further comprises: generating a plurality of action probabilities from a plurality of treatment identifiers and a plurality of user feature sets using a propensity model; generating a plurality of feature vectors from the plurality of user feature sets and the plurality of action probabilities using a feature vector generator; training a segmentation model to generate the plurality of segment clusters from the plurality of feature vectors; generating a reward matrix from the plurality of feature vectors and the plurality of segment clusters using a matrix generator; and generating the plurality of allocation vectors corresponding to the plurality of segment clusters from the reward matrix and a plurality of treatment constraints using an optimization engine.
17. The method of claim 15, wherein selecting the treatment identifier further comprises: selecting one of a random selector and a weighted selector to select the treatment identifier using a treatment selector in response to receiving the request; in response to selecting the weighted selector, selecting the treatment identifier using the segment cluster and the allocation vector , wherein the allocation vector identifies a plurality of weights used to randomly select the treatment identifier; and in response to selecting the random selector, selecting the treatment identifier randomly from a plurality of treatment identifiers.
18. The method of claim 15, further comprising: generating a plurality of test models; generating a plurality of metric projections using a model tester, the plurality of test models, and a current model, wherein the plurality of metric projections are mapped to a group comprising the current model and the plurality of test models in a one to one correspondence; and selecting a subsequent model, to use as the machine learning model, from the group comprising the current model and the plurality of test models using the plurality of metric projections and a model selector.
19. The method of claim 18, wherein the subsequent model is selected as part of a periodic process.
20. The method of claim 1, further comprising: updating the plurality of allocation vectors periodically and prior to selecting a subsequent model to replace the machine learning model.
PCT/CA2020/051698 2019-12-09 2020-12-09 Constrained contextual bandit reinforcement learning WO2021113973A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962945870P 2019-12-09 2019-12-09
US62/945,870 2019-12-09

Publications (1)

Publication Number Publication Date
WO2021113973A1 true WO2021113973A1 (en) 2021-06-17

Family

ID=76329157

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2020/051698 WO2021113973A1 (en) 2019-12-09 2020-12-09 Constrained contextual bandit reinforcement learning

Country Status (1)

Country Link
WO (1) WO2021113973A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6862574B1 (en) * 2000-07-27 2005-03-01 Ncr Corporation Method for customer segmentation with applications to electronic commerce
US8412566B2 (en) * 2003-07-08 2013-04-02 Yt Acquisition Corporation High-precision customer-based targeting by individual usage statistics
US20180075494A1 (en) * 2016-09-12 2018-03-15 Toshiba Tec Kabushiki Kaisha Sales promotion processing system and sales promotion processing program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6862574B1 (en) * 2000-07-27 2005-03-01 Ncr Corporation Method for customer segmentation with applications to electronic commerce
US8412566B2 (en) * 2003-07-08 2013-04-02 Yt Acquisition Corporation High-precision customer-based targeting by individual usage statistics
US20180075494A1 (en) * 2016-09-12 2018-03-15 Toshiba Tec Kabushiki Kaisha Sales promotion processing system and sales promotion processing program

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHAHEEN THOBANI: "Improving E-Commerce Sales Using Machine Learning", THESIS, MIT, February 2018 (2018-02-01), MASSACHUSETTS INSTITUTE OF TECHNOLOGY, XP055833840, Retrieved from the Internet <URL:https://dspace.mit.edu/handle/1721.1/118511> *

Similar Documents

Publication Publication Date Title
CA3089044C (en) Cost optimized dynamic resource allocation in a cloud infrastructure
US20220239733A1 (en) Scalable request authorization
US10887186B2 (en) Scalable web services execution
AU2023202812A1 (en) Framework for transaction categorization personalization
US11886816B2 (en) Bot dialog manager
US12008141B2 (en) Privacy preserving synthetic string generation using recurrent neural networks
US11810022B2 (en) Contact center call volume prediction
WO2021113973A1 (en) Constrained contextual bandit reinforcement learning
US11227233B1 (en) Machine learning suggested articles for a user
CA3119490A1 (en) Contact center call volume prediction
AU2022204720B2 (en) Last mile churn prediction
US11687612B2 (en) Deep learning approach to mitigate the cold-start problem in textual items recommendations
US11621931B2 (en) Personality-profiled language modeling for bot
EP4160512A1 (en) Optimizing questions to retain engagement
US11704173B1 (en) Streaming machine learning platform
US20220035882A1 (en) Personalized messaging and configuration service
US11934984B1 (en) System and method for scheduling tasks
US12014429B2 (en) Calibrated risk scoring and sampling
US20230274291A1 (en) Churn prediction using clickstream data
US20230274292A1 (en) Churn prevention using graphs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20898595

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20898595

Country of ref document: EP

Kind code of ref document: A1