EP3682401A1 - A method and system for real-time online traveller segmentation using machine learning - Google Patents
A method and system for real-time online traveller segmentation using machine learningInfo
- Publication number
- EP3682401A1 EP3682401A1 EP18765117.9A EP18765117A EP3682401A1 EP 3682401 A1 EP3682401 A1 EP 3682401A1 EP 18765117 A EP18765117 A EP 18765117A EP 3682401 A1 EP3682401 A1 EP 3682401A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- traveller
- machine learning
- user
- predetermined
- category
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0206—Price or cost determination based on market factors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0207—Discounts or incentives, e.g. coupons or rebates
- G06Q30/0224—Discounts or incentives, e.g. coupons or rebates based on user history
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
- G06Q30/0253—During e-commerce, i.e. online transactions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
- G06Q30/0255—Targeted advertisements based on user history
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
- G06Q30/0269—Targeted advertisements based on user profile or attribute
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0283—Price estimation or determination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0611—Request for offers or quotes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/14—Travel agencies
Definitions
- the present invention relates to the application of machine learning models for classifying online users.
- embodiments of the invention classify unidentified users in real-time into one or more categories, or segments, using limited information that may be available about each user in the specific online context of travel search, information, and booking systems.
- the invention may be applied in online advertising systems, for example to select
- advertisements most suitable for presentation to a user and/or to determine an appropriate bid price for a view or click-through of an advertisement presented to a user.
- Online (e.g. web-based, mobile, or in-app) advertising differs from advertising in traditional media in its degree of personalised audience targeting.
- broadcast media advertising such as television advertising
- online advertising aims to reach individuals having a particular interest in the product, service, or information that is presented.
- Advertisers whose
- advertisements appear on these websites may pay the operator on the basis of viewing opportunities or impressions (commonly measured as 'cost per thousand impressions', a.k.a. CPM), on the basis of a cost per click (CPC), or according to some other measure of performance.
- CPM cost per thousand impressions'
- CPC cost per click
- the actual selection of an advertisement to be placed on a web page presented to an individual user may be based, at least in part, on a bidding process whereby an advertiser who is willing to pay a higher CPM, CPC, or other cost measure, is more likely to have its advertisement presented to the user.
- an ad exchange is a technology platform that implements a digital marketplace allowing advertisers and publishers of web sites and other online content to buy and sell advertising space, often through real-time auctions.
- Well-known ad exchange platforms include DoubleClickTM (owned by GoogleTM), AppNexusTM, MicrosoftTM Ad ExchangeTM, and OpenXTM.
- An ad exchange maintains a 'pool' of ad impressions. Publishers contribute their ad impressions, e.g. available advertising slots embedded within web pages served to users, into the pool. Buyers can then bid for the
- a supplier of travel booking services will have information on an individual client basis regarding such characteristics as frequency, duration, class, origin, and destination of travel. Collectively, this information may be used to define traveller categories, or market segments, such as 'frequent traveller', 'business traveller', 'luxury traveller', 'budget traveller', 'adventure travellers', and so forth.
- an online user presented via an ad exchange could be identified with a particular client within the travel booking service provider's database, it would be possible to select advertisements that are highly-targeted to the user's known travel interests and preferences, and/or their known market segment, and for the provider's DSP to bid aggressively to place these high-value ads.
- An alternative approach is to gather and store additional cumulative information about unidentified online users, again using a browser cookie or similar to track and maintain this information over time. In this way, it may become possible to link an online user to an individually distinguishable customer/client in the advertiser's database, enabling the rich data available regarding past customer behaviour, preferences, and market segmentation to be employed to select highly targeted advertisements.
- this approach includes:
- the present invention provides a computer-implemented method comprising:
- each individually-distinguishable traveller is assigned an associated tag in the data store as a member or non-member of a predetermined traveller category based upon one or more prior travel bookings of the individually-distinguishable traveller;
- each feature is selected such that a corresponding value thereof may be obtained for an unidentified user in an online context
- a processor configured to execute the machine learning classifier, a feature vector comprising values of the plurality of features
- embodiments of the invention employ rich data typically held by travel booking service providers in their offline client databases in order to 'tag' individually-distinguishable travellers according to predetermined categories, such as market segments.
- an individual traveller may be tagged as a 'frequent traveller' based upon a number of trips taken over a predetermined period, such as a calendar year, as a 'business traveller' based upon a number or ratio of business trips taken, or as a 'luxury traveller' based upon an average cost of each trip taken.
- this level of detail is not available for an unidentified user in an online context.
- available online user information may be limited to characteristics of a single trip in the context that the user may currently be, or have recently been, researching that trip online.
- Embodiments of the invention may therefore advantageously determine a set of features (e.g. trip characteristics) that are available in the online context, and compute values of those features corresponding with offline records of prior travel bookings of individually-distinguishable, and tagged, travellers.
- the resulting feature vectors and associated classifications are then applied to train a supervised machine learning model which can subsequently be deployed, e.g. in a DSP, to make very rapid (e.g. 30 ms or less) classification decisions in the online context.
- the area under the receiver operating characteristic curve (AUROC) was found to be 0.80, which is generally regarded by those skilled in the art of machine learning as a good performance.
- An AUROC in excess of 0.80 was also achieved for a classifier trained using only five features to distinguish between 'business' and 'leisure' travel.
- an individually-distinguishable traveller having records within the offline data store may be classified into a plurality of predetermined traveller categories, and that individual records may be associated with zero, one, or more categories.
- a traveller may be tagged as a 'frequent traveller', a 'business traveller', and a 'luxury traveller', and may have associated records corresponding with both business trips and leisure trips for the purposes of training a machine learning classifier.
- embodiments of the invention may employ one classifier, or multiple classifiers.
- a multi-class classifier may be trained to select between mutually-exclusive categories, such as 'luxury traveller', 'mid-price traveller', and 'budget traveller', while one or more separate binary classifiers may be trained to classify an online user as a 'frequent traveller' and/or a 'business traveller'.
- each individually-distinguishable traveller may be assigned a plurality of associated tags in the data store, each tag identifying the traveller as a member or non-member of a corresponding plurality of
- predetermined traveller categories based upon one or more prior travel bookings of the individually-distinguishable traveller.
- a distinct plurality of features may be associated with each one of the plurality of predetermined traveller categories, and the training step may comprise training one or more machine learning classifiers using computed feature vectors comprising values of the distinct plurality of features associated with each corresponding predetermined traveller category.
- the determining step may comprise executing, by the processor, each one of the machine learning classifiers to determine an estimate of whether the unidentified user is a member or non-member of each corresponding
- the machine learning classifier is configured to generate a value corresponding with a level of confidence in the estimate of whether the unidentified user is a member or non-member of the predetermined traveller category.
- the value may be an estimate of a probability that the unidentified user is a member of the category.
- a decision may be made based upon the estimate, for example by applying a threshold to the generated value. Where the value is an estimate of probability, the threshold may be set at 0.5.
- the machine learning classifier is implemented as a gradient boosting machine.
- a gradient boosting machine Those skilled in the art of machine learning will appreciate, however, that other machine learning models may be employed in embodiments of the invention including, but not limited to, support vector machines (SVM), naive Bayes classifiers, logistic regression classifiers, and neural networks.
- SVM support vector machines
- naive Bayes classifiers naive Bayes classifiers
- logistic regression classifiers logistic regression classifiers
- neural networks neural networks.
- the invention provides a computing apparatus which implements a demand side platform, the computing apparatus comprising:
- the memory device contains a body of program instructions including a machine learning classifier which is executable by the processor and configured to determine an estimate of whether an unidentified online user is a member or non-member of a predetermined traveller category based upon an input feature vector comprising values of a plurality of features, the classifier having been trained using a training set of records of prior travel bookings of a plurality of individually-distinguishable travellers in which each individually- distinguishable traveller is tagged as a member or non-member of the
- predetermined traveller category based upon one or more prior travel bookings of the individually-distinguishable traveller, each of the features being selected such that a corresponding value thereof may be obtained for the unidentified online user
- the body of program instructions further including instructions which, when executed by the processor, cause the computing apparatus to implement a method comprising steps of:
- an apparatus that implements method that links rich offline data with limited online data.
- the bid decision may comprise a positive bid decision, or a negative bid decision.
- a corresponding bid price may be determined, and a bid response comprising the bid price may be transmitted.
- the bid price may be, for example, a fixed bid price, or may be a variable bid price that is computed in accordance with any suitable algorithm.
- the invention provides a computer program comprising program code instructions for executing the steps of the method according to the first aspect when said program is executed on a computer.
- the program code instructions may, for example, be stored on tangible machine- readable media.
- Figure 1 is a schematic diagram illustrating an exemplary networked system embodying the invention
- Figure 2 shows a timeline of communications between a user device, a web server, and ad exchange server, and a DSP embodying the invention
- Figure 3 is a schematic diagram illustrating a system for offline training of a machine learning model embodying the invention
- Figure 4 shows a flowchart of a method of offline training embodying the invention
- Figure 5 shows a flowchart of a method of determining a bid decision by a DSP embodying the invention
- FIG. 6 is an exemplary receiver operating characteristic (ROC) curve for a frequent traveller classifier embodying the invention.
- Figure 7 is an exemplary ROC curve for a business traveller classifier embodying the invention.
- FIG. 1 is a block diagram illustrating an exemplary networked system 100 including a demand side platform (DSP) server 102, which is configured to implement a method of determining a bid for placement of advertising content in accordance with an embodiment of the invention.
- the DSP server 102 may comprise a computer system having a conventional architecture.
- the DSP server 102 as illustrated, comprises a processor 104.
- the processor 104 is operably associated with a non-volatile memory/storage device 106, e.g. via one or more data/address busses 108 as shown.
- the non-volatile storage 106 may be a hard disk drive, and/or may include a solid-state non-volatile memory, such as ROM, flash memory, solid-state drive (SSD), or the like.
- the processor 104 is also interfaced to volatile storage 110, such as RAM, which contains program instructions and transient data relating to the operation of the DSP server 102.
- the storage device 106 maintains known program and data content relevant to the normal operation of the DSP server 102.
- the storage device 106 may contain operating system programs and data, as well as other executable application software necessary for the intended functions of the authentication server 102.
- the storage device 106 also contains program instructions which, when executed by the processor 104, cause the DSP server 102 to perform operations relating to an embodiment of the present invention, such as are described in greater detail below, and with reference to Figures 2 and 5 in particular. In operation, instructions and data held on the storage device 106 are transferred to volatile memory 110 for execution on demand.
- the processor 104 is also operably associated with a communications interface 112 in a conventional manner.
- the communications interface 112 facilitates access to a wide-area data communications network, such as the Internet 116.
- the volatile storage 110 contains a corresponding body 114 of program instructions transferred from the storage device 106 and configured to perform processing and other operations embodying features of the present invention.
- the program instructions 114 comprise a specific technical
- DSP server 102 and other processing systems and devices described in this specification, terms such as 'processor', 'computer', and so forth, unless otherwise required by the context, should be understood as referring to a range of possible implementations of devices, apparatus and systems comprising a combination of hardware and software.
- Physical processors may include general purpose CPUs, digital signal processors, graphics processing units (GPUs), and/or other hardware devices suitable for efficient execution of required programs and algorithms.
- Computing systems may include conventional personal computer architectures, or other general-purpose hardware platforms.
- Software may include open-source and/or commercially-available operating system software in combination with various application and service programs.
- computing or processing platforms may comprise custom hardware and/or software architectures.
- computing and processing systems may comprise cloud computing platforms, enabling physical hardware resources to be allocated dynamically in response to service demands. While all of these variations fall within the scope of the present invention, for ease of explanation and understanding the exemplary embodiments described herein are based upon single-processor general-purpose computing platforms, commonly available operating system platforms, and/or widely available consumer products, such as desktop PCs, notebook or laptop PCs, smartphones, tablet computers, and so forth.
- processing unit' is used in this specification (including the claims) to refer to any suitable combination of hardware and software configured to perform a particular defined task, such as accessing and processing offline or online data, executing training steps of a machine learning model, or executing classification steps of a machine learning model.
- a processing unit may comprise an executable code module executing at a single location on a single processing device, or may comprise cooperating executable code modules executing in multiple locations and/or on multiple processing devices.
- classification and bid decision processing may be performed entirely by code executing on DSP server 102, while in other embodiments corresponding processing may be performed is a distributed manner over a plurality of DSP servers.
- Software components e.g. program instructions 114, embodying features of the invention may be developed using any suitable programming language, development environment, or combinations of languages and development environments, as will be familiar to persons skilled in the art of software engineering.
- suitable software may be developed using the C programming language, the Java programming language, the C++ programming language, the Go programming language, and/or a range of languages suitable for implementation of network or web-based services, such as JavaScript, HTML, PHP, ASP, JSP, Ruby, Python, Perl, and so forth. These examples are not intended to be limiting, and it will be appreciated that convenient languages or development systems may be employed, in accordance with system requirements.
- the system 100 further comprises additional DSP servers, e.g. 118, 120 that, in use, compete with DSP server 102 to bid for placement of advertising content within online slots (i.e. for 'impressions') offered via an ad exchange server 122.
- the ad exchange server 122 implements a digital marketplace allowing advertisers and publishers of web sites and other online content to buy and sell advertising space in the form of a real-time, online auction in which each DSP server 102, 118, 120 is an automated, high-speed, bidder.
- the ad exchange server 122 comprises a database 124 in which it maintains details of online content providers (web servers) and advertisers (DSPs) for the purpose of operating a digital advertising marketplace.
- DoubleclickTM owned by GoogleTM
- AppNexusTM owned by GoogleTM
- MicrosoftTM Ad ExchangeTM and OpenXTM
- OpenXTM OpenXTM
- the system 100 further includes user terminal devices, exemplified by terminal device 126.
- the terminal devices 126 may be, for example, desktop or portable PCs, smartphones, tablets, or other personal computing devices, and each comprise a processor 128 interfaced, e.g. via address/data bus 130, with volatile storage 132, non-volatile storage 134, and at least one data
- the processor 128 is also interfaced to one or more user input/output (I/O) interfaces 140.
- the volatile storage 132 contains program instructions and transient data relating to the operation of the terminal device 126.
- the terminal device storage 132, 134 may contain program and data content relevant to the normal operation of the device 126. This may include operating system programs and data (e.g. associated with a Windows, Android, iOS, MacOS, Linux, or other operating system), as well as other executable application software generally unrelated to the present invention.
- the storage 132 also includes program instructions 138 which, when executed by the processor 128 enable the terminal device to provide a user with access to online content. While many applications are known for providing such access, for simplicity in the present description it is assumed that the program instructions 138 implement a web browser having a graphical user interface (GUI) presented via the user I/O interface 140.
- GUI graphical user interface
- a corresponding web page display 144 is generated via the device Ul 140.
- the display 144 include website content 146, and one or more advertising slots, e.g. 148, 150.
- a number of communications steps then take place in order to populate these slots, i.e. to provide online advertisers with ad impressions within the web page display 144.
- the user terminal 126 via the executing web browser application 138 and responsive to user input, transmits 202 an HTTP request to the web server 142 which includes a URL of desired web content.
- the web server 142 responds by transmitting 204 content, e.g. a web page in HTML format, to the user device 126.
- content e.g. a web page in HTML format
- the complete population and rendering of web page display 144 may require multiple requests and responses, and may involve further transactions with the web server 142 and/or with other online servers, such as content distribution network (CDN) servers and other web servers providing embedded content.
- CDN content distribution network
- the web page transmitted by the web server 142 to the user device 126 typically includes a hypertext reference ('href) directing the browser 138 to retrieve content from the ad exchange server 122 in accordance with an application programming interface (API) defined and provided by the relevant operator of the server 122.
- the user device 126 transmits 208 an HTTP request to the ad exchange server 122.
- the request includes web site information and user information relating to the user of the terminal device 126.
- Available user information may include information that the web server 142 has gathered, and may include client-side information, such as device and browser identity and technical details, identifying information and contents of browser cookies, and the like.
- client-side information such as device and browser identity and technical details, identifying information and contents of browser cookies, and the like.
- the ad exchange server 122 receives the request, identifies relevant DSP servers 102, 118, 120 in its database 124, and transmits 210 bid request messages to each selected DSP server.
- One such bid request message including site and user information, is received at DSP server 102 embodying the present invention, which executes a process 212 in accordance with its specific programming 114 in order to classify the user and arrive at a bid decision.
- the DSP server 102 transmits 214 the bid to the ad exchange server 122.
- the ad exchange server 122 receives all bids transmitted from DSP servers, including server 102, and selects a winning bid. It then retrieves ad content corresponding with the winning bid from its database 124, and transmits 216 the ad content to the user device 126 for rendering within the corresponding ad slot, e.g. 148 or 150.
- This decision must be made with limited user information, and in view of the fact that a bad decision may have significant consequences for the advertiser. For example, if the DSP server wrongly determines that the user is a desirable target for a particular ad (i.e. computes a 'false positive'), it may place a relatively high winning bid and incur a real cost with little or no prospect of any return. Conversely, if the DSP server wrongly determines that the user is not a desirable target for the ad (i.e. computes a 'false negative'), it may place no bid, or a low losing bid, and cause the advertiser to miss an opportunity to obtain an impression with a real prospect of a return.
- offline data such as the contents of a client database containing detailed records of clients and travel bookings, may be used to classify travellers according to one or more market segments or categories.
- suitable categories or segments may include 'frequent traveller', 'business traveller', 'luxury traveller', 'budget traveller', and 'mid-range traveller'. Segments, and appropriate characteristics, may be determined according to an understanding of the market for travel services.
- an individual traveller may be tagged as a 'frequent traveller' based upon a number of trips taken over a predetermined period, such as a calendar year, as a 'business traveller' based upon a number or ratio of business trips taken, or as a 'luxury traveller' based upon an average cost of each trip taken.
- FIG. 3 is a block diagram illustrating a system 300 for offline training of a machine learning model.
- the system 300 includes one or more high- performance computing system 302, preferably comprising hardware and/or software that is optimised for efficient execution of one or more machine learning models.
- Each computing system 302 comprises a central processor 304 interfaced, e.g.
- the computing system may also include one or more GPUs (not shown), in view of the fact that certain machine learning models, such as neural network and deep learning models, are known to be efficiently implemented using highly parallel, vectorized, algorithms for which GPUs are particularly well-suited.
- the volatile storage 308 contains program instructions and transient data relating to the operation of the computing system 302.
- the computing system storage 308, 310 may contain program and data content relevant to its normal operation, which may include operating system programs and data (e.g. associated with a Windows, MacOS, Linux, or other operating system), as well as other executable application and/or system software generally unrelated to the present invention.
- the storage 308 also includes program instructions 324 which, when executed by the processor 128 implement an offline training process for a machine learning model. In particular, travel booking records and associated categories may be retrieved from an offline database server 316, and employed for training of the machine learning model.
- FIG. 4 is a flowchart 400 illustrating a method of offline training embodying the invention, such as may be implemented by the computing system 302.
- step 402 travel booking records and associated categories (which may alternatively be called 'tags', labels', 'classes', or
- a set of feature vectors is computed using the contents of the travel booking records. This is an important step in the method, which has the effect of linking the detailed offline data corresponding with individually-distinguishable travellers available in the database 316 with the more limited unidentified user information that is available in the online context, i.e. as transmitted 210 to the DSP 102.
- a feature is an item of information (e.g. a numerical, categorical, or Boolean value) that can be derived from both the detailed offline data and the more-limited online data.
- information e.g. a numerical, categorical, or Boolean value
- online data relevant to travel booking services and captured from an unidentified user's online activities relating to an actual or potential trip may include origin of travel, destination of travel, date of departure, date of arrival, and duration of trip.
- a feature vector is a set of the features derivable from both offline and online data that are collectively used for training of the machine learning model, and for subsequent online classification by the DSP server 102.
- Feature design/selection is an important step in the development of effective machine learning systems, and examples of feature vectors developed in accordance with embodiments of the invention are described further below, with reference to Figures 6 and 7.
- an untrained machine learning model is initialised. This step involved creation and initialisation of data structures comprising the model, as well as the setting of relevant parameters/hyperparameters for the training process. It may also involve model selection, and in some embodiments the model may in fact combine multiple models (ensemble learning).
- the computing system 302 executes one or more training procedures in accordance with the selected one or more machine learning algorithms. Training involves inputting at least a portion of the computed feature vectors and corresponding tags as a training set, and applying a training procedure adapted to minimise an objective function which reflects an accuracy of the trained model in classifying the feature vectors according to the known tags.
- the trained model may be tested using a test set, and/or cross-validation set, which may, for example, comprise a portion of the computed feature vectors and corresponding tags held back from the training step 408 for this purpose.
- the results of the test step 410 may be evaluated to determine whether they satisfy a suitable criterion of quality (examples of which are described below with reference to Figures 6 and 7). If not, then at step 414 the model parameters/hyperparameters may be updated and the model reinitialised for retraining at step 408. Alternatively, if the model is deemed to be of sufficient quality, a representation of the trained model is saved at step 416. This representation is suitable to be loaded and executed by the DSP server, as described below with reference to Figure 5.
- an extreme Gradient Boosting (XGBoost) machine learning model is employed, originally developed by Tianqi Chen and Carlos Guestrin at the University of Washington.
- XGBoost extreme Gradient Boosting
- the XGBoost system is highly scalable, is widely- used and tested, and an efficient implementation in C/C++ is available as an open source package, with bindings to other languages used in technical computing such as Python, R and Julia. Results from this embodiment are presented below with reference to Figures 6 and 7.
- FIG. 5 a flowchart 500 of a method of determining a bid decision by DSP server 102.
- site and unidentified user information is received, i.e. via transmission 210 from the ad exchange server 122.
- This information is used at step 504 to compute a feature vector, which is input to the machine learning model executed at step 506.
- this model execution is based on the representation saved at step 416 of the process 400.
- the output of the model is an estimate of the classification of the user based on the calculated feature vector which may be, in the case of the XGBoost algorithm for example, a generated numerical value representing a level of confidence in the estimate of whether the unidentified user is a member or non- member of the category for which the model was trained.
- the value may be an estimate of a probability that the unidentified online user is a member of the category.
- a bid decision is made based upon the estimate.
- the decision may include determining whether or not to bid at all, and/or a
- a threshold may be applied to the generated value, such that if the value is below the threshold then no bid is made.
- a bid amount may be determined based upon the magnitude of the generated value, such that a higher price is bid if the model indicates a higher confidence in the classification of the unidentified user.
- the bid information is transmitted 214 back to the ad exchange server 122 at step 512.
- the use of a machine learning model as described above has a number of advantages, and addresses particular problems present in prior art approaches. Firstly, it overcomes limitations with linking of offline and online data. In particular, the machine learning model itself, which is trained on offline data and subsequently executed on online data, effectively becomes the means of linkage. Secondly, it is not relevant that the unidentified online users may not correspond with any of the customers/clients having records in the offline database. Accordingly, 100% of online users can be classified by the model, so long as the minimum information required to compute the feature vectors is available. Thirdly, the method and system avoid privacy constraints, because the stored representation of the trained machine learning model comprises data structures that contain no individually-identifiable personal data of any
- the training process may be highly computationally-intensive, requiring high-performance computing resources and extended time periods, the execution of the resulting trained model on a single feature vector can be extremely fast, easily satisfying the requirement to compute a decision in 30 milliseconds or less.
- a commercial database was employed for offline training, containing cleaned records in which individually-distinguishable travellers were reconciled with a total of 1 ,328,694 trips.
- a 'frequent traveller' was defined as a person who took five or more trips in any 12 month period. Using this definition, 5.3% of trips in the data set were automatically tagged as 'frequent traveller' trips.
- An XGBoost model with 489 trees was trained using feature vectors comprising the above feature set, computed using the tagged trips.
- a resulting ROC curve 600 is shown in Figure 6, wherein the horizontal axis 602 represents false positive rate (FPR), the vertical axis 604 represents true positive rate (TPR), and the ROC 606 is generated by sweeping the threshold between 0.0 and 1.0 at which the model output is determined to indicate a 'frequent traveller'.
- the AUROC 608 is 0.8, which compares with the value of 0.5 that would be obtained by making purely random decision, and is regarded as good performance of the machine learning classifier.
- the TPR correctedly-classified frequent travellers
- FPR wrongly-classified non-frequent travellers
- the model was configured to produce an output estimate representing a level of confidence of whether a trip corresponding with an input feature vector is associated with a 'luxury traveller', on a scale of 0.0 to 1.0.
- the AU ROC for this binary classifier was found to be 0.83.
- embodiments of the present invention provide systems and methods employing machine learning models to classify unidentified online users, using limited information, into different traveller categories using training data derived from offline databases containing records relating to individually- distinguishable travellers.
- the machine learning models effectively provide a 'smart' linkage between rich offline data and limited online data.
- the online users need not have been previously encountered by the system, and classification can be performed for any user so long as the minimum information required to compute the model feature vectors is available.
- the system protects privacy, in that no individually-identifiable personal data of any customer/client in the offline database is reflected in the deployed machine learning models.
- classification on newly-observed online users can be extremely fast, e.g. 30 milliseconds or less.
- models with good predictive power can be developed. Predictions generate by the models can therefore be used with confidence for high-speed, real-time, online decision-making, such as in bidding for impressions within a digital advertising marketplace facilitated by an ad exchange server.
Abstract
Description
Claims
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/704,428 US11120480B2 (en) | 2017-09-14 | 2017-09-14 | Systems and methods for real-time online traveler segmentation using machine learning |
FR1758517A FR3071087A1 (en) | 2017-09-14 | 2017-09-14 | A METHOD AND SYSTEM FOR REAL-TIME ONLINE TRAVELER SEGMENTATION USING AUTOMATIC APPRENTICESHIP |
PCT/EP2018/073838 WO2019052868A1 (en) | 2017-09-14 | 2018-09-05 | A method and system for real-time online traveller segmentation using machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3682401A1 true EP3682401A1 (en) | 2020-07-22 |
Family
ID=63490479
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP18765117.9A Pending EP3682401A1 (en) | 2017-09-14 | 2018-09-05 | A method and system for real-time online traveller segmentation using machine learning |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP3682401A1 (en) |
CN (1) | CN111095331B (en) |
WO (1) | WO2019052868A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110198309A (en) * | 2019-05-14 | 2019-09-03 | 北京墨云科技有限公司 | A kind of Web server recognition methods, device, terminal and storage medium |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7698422B2 (en) * | 2007-09-10 | 2010-04-13 | Specific Media, Inc. | System and method of determining user demographic profiles of anonymous users |
US8515937B1 (en) * | 2008-06-30 | 2013-08-20 | Alexa Internet | Automated identification and assessment of keywords capable of driving traffic to particular sites |
US20100312586A1 (en) * | 2009-06-03 | 2010-12-09 | Drefs Martin J | Generation of Travel-Related Offerings |
US8626697B1 (en) * | 2010-03-01 | 2014-01-07 | magnify360, Inc. | Website user profiling using anonymously collected data |
US8943079B2 (en) * | 2012-02-01 | 2015-01-27 | Telefonaktiebolaget L M Ericsson (Publ) | Apparatus and methods for anonymizing a data set |
JP6440732B2 (en) * | 2013-11-27 | 2018-12-19 | 株式会社Nttドコモ | Automatic task classification based on machine learning |
-
2018
- 2018-09-05 EP EP18765117.9A patent/EP3682401A1/en active Pending
- 2018-09-05 CN CN201880056990.7A patent/CN111095331B/en active Active
- 2018-09-05 WO PCT/EP2018/073838 patent/WO2019052868A1/en unknown
Also Published As
Publication number | Publication date |
---|---|
WO2019052868A1 (en) | 2019-03-21 |
CN111095331B (en) | 2023-09-22 |
CN111095331A (en) | 2020-05-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11120480B2 (en) | Systems and methods for real-time online traveler segmentation using machine learning | |
US10943184B2 (en) | Machine learning methods and systems for predicting online user interactions | |
US20190213670A1 (en) | Method and system for electronic advertising | |
CA2751646C (en) | Determining conversion probability using session metrics | |
US20080004990A1 (en) | Virtual spot market for advertisements | |
US20080004948A1 (en) | Auctioning for video and audio advertising | |
US20140236738A1 (en) | Method and system for placement and pricing of internet-based advertisements or services | |
US20190080363A1 (en) | Methods and systems for intelligent adaptive bidding in an automated online exchange network | |
US20120059707A1 (en) | Methods and apparatus to cluster user data | |
US20110264519A1 (en) | Social behavioral targeting of advertisements in a social networking environment | |
CN111095330B (en) | Machine learning method and system for predicting online user interactions | |
US20150278877A1 (en) | User Engagement-Based Contextually-Dependent Automated Reserve Price for Non-Guaranteed Delivery Advertising Auction | |
KR20080043777A (en) | Automatically generating content for presenting in a preview pane for ads | |
WO2012088596A1 (en) | System and method for real-time search re-targeting | |
WO2019052870A1 (en) | A method and system for intelligent adaptive bidding in an automated online exchange network | |
US20160275569A1 (en) | Method and system for advertisement coordination | |
JP6320258B2 (en) | Extraction apparatus, extraction method, and extraction program | |
JP2018088282A (en) | Extracting apparatus, extracting method, and extracting program | |
US20160267551A1 (en) | Method and system for advertisement coordination | |
CN111095331B (en) | Method and system for real-time online traveler subdivision using machine learning | |
US11741505B2 (en) | System and method for predicting an anticipated transaction | |
US20160267531A1 (en) | Method and system for advertisement coordination | |
US20160275568A1 (en) | Method and system for advertisement coordination | |
FR3071087A1 (en) | A METHOD AND SYSTEM FOR REAL-TIME ONLINE TRAVELER SEGMENTATION USING AUTOMATIC APPRENTICESHIP | |
AU2021107385A4 (en) | System, Program, and Method for Presenting and Broadcasting Content |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20200114 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: RENAUDIE, DAVID Inventor name: LHERITIER, ALIX Inventor name: MOTTINI D'OLIVEIRA, ALEJANDRO RICARDO Inventor name: ACUNA AGOST, RODRIGO |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20210920 |