CN111104614A

CN111104614A - Method for generating recall information for tourist destination recommendation system

Info

Publication number: CN111104614A
Application number: CN201911266312.6A
Authority: CN
Inventors: 李明; 江文斌; 李健
Original assignee: Shanghai Zhilv Information Technology Co Ltd
Current assignee: Shanghai Zhilv Information Technology Co Ltd
Priority date: 2019-12-11
Filing date: 2019-12-11
Publication date: 2020-05-05

Abstract

The invention discloses a method for generating recall information for a tourist destination recommendation system, which comprises the following steps: acquiring user history information and destination information, constructing user characteristics according to the user history information, and constructing destination characteristics according to the destination information; constructing a training sample according to the user characteristic and the destination characteristic; training the XGboost model according to the training samples to obtain a sequencing model; recall information is generated according to the ranking model. The invention improves the accuracy of generating the recall information.

Description

Method for generating recall information for tourist destination recommendation system

Technical Field

The invention belongs to the technical field of travel information recommendation, and particularly relates to a method for generating recall information for a tourist destination recommendation system.

Background

With the development of information technology and the internet, people gradually move from the times of lacking information to the times of information overload. It is a difficult problem that consumers want to find information of interest from a large amount of information (articles), and information producers want to stand out and pay attention to the information produced by the consumers. The recommendation system builds a bridge between the information producer and the consumer to establish connection, helps the user to solve the problem of information overload, and also helps the merchant to find the target user. The recommendation system needs to explore the behaviors of the users and find the personalized requirements of the users, so that the commodities are accurately recommended to the users needing the recommendation system.

It takes a lot of time and effort for the user who needs to schedule the travel plan to evaluate the collected destination information. The tourist destination recommending system learns the preference of the user by analyzing the user attribute and behavior and finds the destination matched with the interest point required by the user. In fact, travel destination recommendations are more complex than recommendations in other areas. Because the number of the evaluations of the tourist destinations by the users is small, the data are more sparse. Moreover, the recommendation of the tourist destination is easily affected by factors such as season, traffic, cost, etc., so that it is difficult to construct a recommendation model of the tourist destination which integrates many valuable factors.

The basic recommendation framework can be mainly divided into a recall part and a sorting part. In the recalling stage, a small part of candidate sets which are possibly interested by the user are obtained from the full-amount commodity library, which is equivalent to rough sorting, and in the sorting stage, the candidate sets obtained in the recalling stage are sorted accurately and recommended to the user.

In the prior art, the recall accuracy is low.

Disclosure of Invention

The invention aims to overcome the defect of low recall accuracy in the travel information recommendation process in the prior art, and provides a recall information generation method for a tourist destination recommendation system.

The invention solves the technical problems through the following technical scheme:

the invention provides a method for generating recall information for a tourist destination recommendation system, which comprises the following steps:

acquiring user history information and destination information, constructing user characteristics according to the user history information, and constructing destination characteristics according to the destination information;

constructing a training sample according to the user characteristic and the destination characteristic;

training an XGboost (a model) model according to training samples to obtain a sequencing model;

recall information is generated according to the ranking model.

Preferably, the user characteristics include user base characteristics including at least one of age, gender, and price sensitivity.

Preferably, the user characteristics further include user behavior characteristics including at least one of an average amount of orders in a recent year, days of travel to browse travel products in a recent year.

Preferably, the destination characteristics comprise destination label characteristics comprising at least one of forest, ski, ornamental, building, history.

Preferably, the destination characteristics further include destination ground characteristics including at least one of whether overseas, recommended number of days played.

Preferably, the step of constructing training samples according to the user characteristics and the destination characteristics further comprises:

and carrying out data cleaning on the user characteristics and the destination characteristics, and constructing a training sample according to the cleaned user characteristics and the destination characteristics.

Preferably, the data cleaning includes at least one of outlier detection, constant variable culling, feature discretization, and categorical variable processing.

Preferably, the loss function of the XGBoost model is defined as:

wherein the content of the first and second substances,

for a conventional loss function, T represents the number of leaf nodes, γ is a first parameter,

as a regular term, ω represents the fraction of the leaf node and λ is the second parameter.

Preferably, the XGBoost model is based on a PairWise ordering algorithm.

The positive progress effects of the invention are as follows: the invention improves the accuracy of generating the recall information.

Drawings

FIG. 1 is a flow chart of a method for generating recall information for a travel destination recommendation system in accordance with a preferred embodiment of the present invention.

Detailed Description

The present invention is further illustrated by the following preferred embodiments, but is not intended to be limited thereby.

The present embodiment provides a generation method of recall information for a travel destination recommendation system. Referring to fig. 1, the method for generating recall information for a travel destination recommendation system includes the steps of:

step S101, obtaining user history information and destination information, constructing user characteristics according to the user history information, and constructing destination characteristics according to the destination information.

And S102, constructing a training sample according to the user characteristics and the destination characteristics.

And S103, training the XGboost model according to the training samples to obtain a sequencing model.

And step S104, generating recall information according to the sequencing model.

In specific implementation, in step S101, user history information and destination information are acquired, a user characteristic is constructed according to the user history information, and a destination characteristic is constructed according to the destination information. The user characteristics include user base characteristics including at least one of age, gender, and price sensitivity. The user characteristics further include user behavior characteristics including at least one of an average amount of orders in a recent year, days of travel to browse travel products in a recent year. The destination characteristics include destination label characteristics including at least one of forest, ski, ornamental, building, history. The destination characteristics further include a destination base characteristic including at least one of whether overseas, recommended number of days played.

In step S102, data cleaning is performed on the user feature and the destination feature, and a training sample is constructed according to the cleaned user feature and the destination feature. The data cleaning comprises at least one of outlier detection, constant variable elimination, feature discretization and type variable processing. Outlier detection: sample points that deviate from the overall distribution of the samples are deleted. Removing the constant variables: features with standard deviation close to the constant variable are considered as invalid features at the time of model training and need to be removed. Characteristic discretization: and segmenting the continuous data into a segment of discretization interval. The segmentation principle includes equal-step division, equal-frequency division, clustering-based division or artificial division. For example, the characteristic "age" is partitioned into [0-17], [18-23], [24-35], [36-49], [50-100] based on a priori knowledge of age awareness and population age distribution using portable APP. Processing category type variables: mainly comprising serial number coding and one-hot coding. For example, the characteristic "age" is discretized and then changed into a category variable, and after the category variable is coded, the values corresponding to the five categories are respectively: 0. 1, 2, 3 and 4. And for the categorical variables without the size relationship, adopting one-hot coding. For example, the characteristic "sex" takes values of "male" and "female", and after one-hot encoding, "male" corresponds to (1,0) and "female" corresponds to (0, 1).

And then screening part of the discrete features from the features as grouping features. The main features of the user grouping include: age, price sensitivity, etc. The main features for destination packets include: country of belongings, subject label, etc. And then grouping the users in an aggregation mode according to the equal grouping characteristic values, and dividing the users with completely equal grouping characteristics into the same group. Other features take the mean of the features of users within the same group as the group feature. Each user group has a unique corresponding group identification user _ class _ id (for characterizing the group identification). The same principle is applied to the destination grouping, and each user group has a unique corresponding group identification poi _ class _ id (used for representing the group identification).

The training sample is mainly constructed by marking the strength of the travel will of the user group on the destination group as the label (will value) of the sample. The number of training samples is taken within a day for users who have a search record on the travel APP. The specific labeling strategy is as follows: and counting the times of point searching and order placing of the user group on the destination group, and adding the times according to the weight of 3:7 to obtain a willingness score. And 5, comparatively discretizing the willingness level into 5 types as the travel willingness value of the user group to the destination group. The construction of one sample is shown in the following table (where features denote features):

user_class_id

poi_class_id

features

label

the test samples were taken for data within one day. And counting the proportion of the recalled destination hitting the lower order destination as a measurement index.

The ranking algorithm for the recall method of the travel destination recommendation system uses the PairWise-based XGboost model. XGBoost is one of boosting algorithms, the idea being to integrate many weak classifiers together to form a strong classifier. The XGboost is a lifting tree model, and a tree is generated by continuously adding tree models and continuously performing feature splitting; every new tree is generated, and in fact a new function is learned to fit the residual of the last round of prediction. And obtaining K subtrees after training is finished, wherein each sample falls to a corresponding leaf node in each tree, each leaf node corresponds to a score, and the final predicted value of the sample is the sum of the scores of the leaf nodes of each corresponding tree. The tree model used is a CART (a classification tree) regression tree model. The CART regression tree is a binary tree, and the sample space is divided by continuously splitting the features.

The XGboost penalty function is defined as:

wherein the content of the first and second substances,

the method comprises the following steps that (1) T represents the number of leaf nodes and gamma is a first parameter and is a fraction value used for controlling the number of CART trees in a conventional loss function;

and the regularization term is used for limiting the complexity of the model and reducing the risk of overfitting, wherein omega represents the fraction of the leaf node, and lambda is a second parameter and is a fraction value of the leaf node used for controlling the CART tree.

XGboost adopts a mode of an additive model, and a loss function after formalization is as follows:

in XGBoost, the gradient is obtained by performing a second-order taylor expansion on the loss function, which is shown as follows:

the ranking algorithms can be basically classified into three major categories: PointWise (a sort algorithm), PairWise, ListWise (a sort algorithm). PointWise: only the absolute relevance of a single query result under a given query is considered, and the relevance of other query results and the given query is not considered. PairWise considers the relative relevance between any two query results with different relevance. Listwise: and directly considering the whole sequence of the query result set under the given query, and directly optimizing the query result sequence output by the model to enable the query result sequence to be as close to the real query result sequence as possible. After the accuracy and complexity of the model are considered comprehensively, the XGboost model based on PairWise is selected to generate the ranking model by the method for generating the recall information of the tourist destination recommendation system.

After the ranking model is generated, recall information is generated using the ranking model.

By adopting the method for generating the recall information for the tourist destination recommendation system of the embodiment, for each user entering the tourist destination recommendation system, the characteristics of the user are obtained according to the historical information of the user, the user group to which the user belongs is judged according to the characteristics, the corresponding destination group sequence is obtained through the sequencing model, and the destination group at the top in the sequence is taken as the recall result. And then sorting the recall results to obtain a final destination recommendation result. The method for generating the recall information for the tourist destination recommendation system improves the accuracy of generating the recall information

While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.

Claims

1. A method for generating recall information for a travel destination recommendation system, comprising the steps of:

training an XGboost model according to the training samples to obtain a sequencing model;

and generating recall information according to the sequencing model.

2. The method of generating recall information for a travel destination recommendation system of claim 1 wherein the user characteristics comprise user base characteristics comprising at least one of age, gender, and price sensitivity.

3. The method of generating recall information for a travel destination recommendation system of claim 2 wherein the user characteristics further comprise user behavior characteristics comprising at least one of an average amount of orders in a recent year, days of travel to browse travel products in a recent year.

4. The method of generating recall information for a travel destination recommendation system of claim 1 wherein the destination characteristics comprise destination label characteristics comprising at least one of forest, ski, ornamental, architectural, historical.

5. The method of generating recall information for a travel destination recommendation system of claim 4 wherein the destination characteristics further comprise destination ground characteristics including at least one of whether overseas, recommended number of days played.

6. The method of generating recall information for a travel destination recommendation system of claim 1 wherein the step of constructing a training sample based on the user characteristics and the destination characteristics further comprises:

and performing data cleaning on the user characteristics and the destination characteristics, and constructing a training sample according to the cleaned user characteristics and the destination characteristics.

7. The method of generating recall information for a travel destination recommendation system of claim 6 wherein the data cleansing comprises at least one of outlier detection, constant variable culling, feature discretization, and categorical variable processing.

8. The method of generating recall information for a travel destination recommendation system of claim 1 wherein the loss function of the XGBoost model is defined as:

wherein the content of the first and second substances,

9. The method of generating recall information for a travel destination recommendation system of claim 8 wherein the XGBoost model is based on a PairWise ranking algorithm.