CN111104614A - Method for generating recall information for tourist destination recommendation system - Google Patents
Method for generating recall information for tourist destination recommendation system Download PDFInfo
- Publication number
- CN111104614A CN111104614A CN201911266312.6A CN201911266312A CN111104614A CN 111104614 A CN111104614 A CN 111104614A CN 201911266312 A CN201911266312 A CN 201911266312A CN 111104614 A CN111104614 A CN 111104614A
- Authority
- CN
- China
- Prior art keywords
- destination
- user
- information
- recommendation system
- recall information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000012549 training Methods 0.000 claims abstract description 21
- 238000012163 sequencing technique Methods 0.000 claims abstract description 7
- 238000004140 cleaning Methods 0.000 claims description 5
- 238000013450 outlier detection Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 230000035945 sensitivity Effects 0.000 claims description 4
- 239000000126 substance Substances 0.000 claims description 3
- 230000006399 behavior Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9537—Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for generating recall information for a tourist destination recommendation system, which comprises the following steps: acquiring user history information and destination information, constructing user characteristics according to the user history information, and constructing destination characteristics according to the destination information; constructing a training sample according to the user characteristic and the destination characteristic; training the XGboost model according to the training samples to obtain a sequencing model; recall information is generated according to the ranking model. The invention improves the accuracy of generating the recall information.
Description
Technical Field
The invention belongs to the technical field of travel information recommendation, and particularly relates to a method for generating recall information for a tourist destination recommendation system.
Background
With the development of information technology and the internet, people gradually move from the times of lacking information to the times of information overload. It is a difficult problem that consumers want to find information of interest from a large amount of information (articles), and information producers want to stand out and pay attention to the information produced by the consumers. The recommendation system builds a bridge between the information producer and the consumer to establish connection, helps the user to solve the problem of information overload, and also helps the merchant to find the target user. The recommendation system needs to explore the behaviors of the users and find the personalized requirements of the users, so that the commodities are accurately recommended to the users needing the recommendation system.
It takes a lot of time and effort for the user who needs to schedule the travel plan to evaluate the collected destination information. The tourist destination recommending system learns the preference of the user by analyzing the user attribute and behavior and finds the destination matched with the interest point required by the user. In fact, travel destination recommendations are more complex than recommendations in other areas. Because the number of the evaluations of the tourist destinations by the users is small, the data are more sparse. Moreover, the recommendation of the tourist destination is easily affected by factors such as season, traffic, cost, etc., so that it is difficult to construct a recommendation model of the tourist destination which integrates many valuable factors.
The basic recommendation framework can be mainly divided into a recall part and a sorting part. In the recalling stage, a small part of candidate sets which are possibly interested by the user are obtained from the full-amount commodity library, which is equivalent to rough sorting, and in the sorting stage, the candidate sets obtained in the recalling stage are sorted accurately and recommended to the user.
In the prior art, the recall accuracy is low.
Disclosure of Invention
The invention aims to overcome the defect of low recall accuracy in the travel information recommendation process in the prior art, and provides a recall information generation method for a tourist destination recommendation system.
The invention solves the technical problems through the following technical scheme:
the invention provides a method for generating recall information for a tourist destination recommendation system, which comprises the following steps:
acquiring user history information and destination information, constructing user characteristics according to the user history information, and constructing destination characteristics according to the destination information;
constructing a training sample according to the user characteristic and the destination characteristic;
training an XGboost (a model) model according to training samples to obtain a sequencing model;
recall information is generated according to the ranking model.
Preferably, the user characteristics include user base characteristics including at least one of age, gender, and price sensitivity.
Preferably, the user characteristics further include user behavior characteristics including at least one of an average amount of orders in a recent year, days of travel to browse travel products in a recent year.
Preferably, the destination characteristics comprise destination label characteristics comprising at least one of forest, ski, ornamental, building, history.
Preferably, the destination characteristics further include destination ground characteristics including at least one of whether overseas, recommended number of days played.
Preferably, the step of constructing training samples according to the user characteristics and the destination characteristics further comprises:
and carrying out data cleaning on the user characteristics and the destination characteristics, and constructing a training sample according to the cleaned user characteristics and the destination characteristics.
Preferably, the data cleaning includes at least one of outlier detection, constant variable culling, feature discretization, and categorical variable processing.
Preferably, the loss function of the XGBoost model is defined as:
wherein the content of the first and second substances,for a conventional loss function, T represents the number of leaf nodes, γ is a first parameter,
Preferably, the XGBoost model is based on a PairWise ordering algorithm.
The positive progress effects of the invention are as follows: the invention improves the accuracy of generating the recall information.
Drawings
FIG. 1 is a flow chart of a method for generating recall information for a travel destination recommendation system in accordance with a preferred embodiment of the present invention.
Detailed Description
The present invention is further illustrated by the following preferred embodiments, but is not intended to be limited thereby.
The present embodiment provides a generation method of recall information for a travel destination recommendation system. Referring to fig. 1, the method for generating recall information for a travel destination recommendation system includes the steps of:
step S101, obtaining user history information and destination information, constructing user characteristics according to the user history information, and constructing destination characteristics according to the destination information.
And S102, constructing a training sample according to the user characteristics and the destination characteristics.
And S103, training the XGboost model according to the training samples to obtain a sequencing model.
And step S104, generating recall information according to the sequencing model.
In specific implementation, in step S101, user history information and destination information are acquired, a user characteristic is constructed according to the user history information, and a destination characteristic is constructed according to the destination information. The user characteristics include user base characteristics including at least one of age, gender, and price sensitivity. The user characteristics further include user behavior characteristics including at least one of an average amount of orders in a recent year, days of travel to browse travel products in a recent year. The destination characteristics include destination label characteristics including at least one of forest, ski, ornamental, building, history. The destination characteristics further include a destination base characteristic including at least one of whether overseas, recommended number of days played.
In step S102, data cleaning is performed on the user feature and the destination feature, and a training sample is constructed according to the cleaned user feature and the destination feature. The data cleaning comprises at least one of outlier detection, constant variable elimination, feature discretization and type variable processing. Outlier detection: sample points that deviate from the overall distribution of the samples are deleted. Removing the constant variables: features with standard deviation close to the constant variable are considered as invalid features at the time of model training and need to be removed. Characteristic discretization: and segmenting the continuous data into a segment of discretization interval. The segmentation principle includes equal-step division, equal-frequency division, clustering-based division or artificial division. For example, the characteristic "age" is partitioned into [0-17], [18-23], [24-35], [36-49], [50-100] based on a priori knowledge of age awareness and population age distribution using portable APP. Processing category type variables: mainly comprising serial number coding and one-hot coding. For example, the characteristic "age" is discretized and then changed into a category variable, and after the category variable is coded, the values corresponding to the five categories are respectively: 0. 1, 2, 3 and 4. And for the categorical variables without the size relationship, adopting one-hot coding. For example, the characteristic "sex" takes values of "male" and "female", and after one-hot encoding, "male" corresponds to (1,0) and "female" corresponds to (0, 1).
And then screening part of the discrete features from the features as grouping features. The main features of the user grouping include: age, price sensitivity, etc. The main features for destination packets include: country of belongings, subject label, etc. And then grouping the users in an aggregation mode according to the equal grouping characteristic values, and dividing the users with completely equal grouping characteristics into the same group. Other features take the mean of the features of users within the same group as the group feature. Each user group has a unique corresponding group identification user _ class _ id (for characterizing the group identification). The same principle is applied to the destination grouping, and each user group has a unique corresponding group identification poi _ class _ id (used for representing the group identification).
The training sample is mainly constructed by marking the strength of the travel will of the user group on the destination group as the label (will value) of the sample. The number of training samples is taken within a day for users who have a search record on the travel APP. The specific labeling strategy is as follows: and counting the times of point searching and order placing of the user group on the destination group, and adding the times according to the weight of 3:7 to obtain a willingness score. And 5, comparatively discretizing the willingness level into 5 types as the travel willingness value of the user group to the destination group. The construction of one sample is shown in the following table (where features denote features):
user_class_id | poi_class_id | features | label |
the test samples were taken for data within one day. And counting the proportion of the recalled destination hitting the lower order destination as a measurement index.
The ranking algorithm for the recall method of the travel destination recommendation system uses the PairWise-based XGboost model. XGBoost is one of boosting algorithms, the idea being to integrate many weak classifiers together to form a strong classifier. The XGboost is a lifting tree model, and a tree is generated by continuously adding tree models and continuously performing feature splitting; every new tree is generated, and in fact a new function is learned to fit the residual of the last round of prediction. And obtaining K subtrees after training is finished, wherein each sample falls to a corresponding leaf node in each tree, each leaf node corresponds to a score, and the final predicted value of the sample is the sum of the scores of the leaf nodes of each corresponding tree. The tree model used is a CART (a classification tree) regression tree model. The CART regression tree is a binary tree, and the sample space is divided by continuously splitting the features.
The XGboost penalty function is defined as:
wherein the content of the first and second substances,the method comprises the following steps that (1) T represents the number of leaf nodes and gamma is a first parameter and is a fraction value used for controlling the number of CART trees in a conventional loss function;
and the regularization term is used for limiting the complexity of the model and reducing the risk of overfitting, wherein omega represents the fraction of the leaf node, and lambda is a second parameter and is a fraction value of the leaf node used for controlling the CART tree.
XGboost adopts a mode of an additive model, and a loss function after formalization is as follows:
in XGBoost, the gradient is obtained by performing a second-order taylor expansion on the loss function, which is shown as follows:
the ranking algorithms can be basically classified into three major categories: PointWise (a sort algorithm), PairWise, ListWise (a sort algorithm). PointWise: only the absolute relevance of a single query result under a given query is considered, and the relevance of other query results and the given query is not considered. PairWise considers the relative relevance between any two query results with different relevance. Listwise: and directly considering the whole sequence of the query result set under the given query, and directly optimizing the query result sequence output by the model to enable the query result sequence to be as close to the real query result sequence as possible. After the accuracy and complexity of the model are considered comprehensively, the XGboost model based on PairWise is selected to generate the ranking model by the method for generating the recall information of the tourist destination recommendation system.
After the ranking model is generated, recall information is generated using the ranking model.
By adopting the method for generating the recall information for the tourist destination recommendation system of the embodiment, for each user entering the tourist destination recommendation system, the characteristics of the user are obtained according to the historical information of the user, the user group to which the user belongs is judged according to the characteristics, the corresponding destination group sequence is obtained through the sequencing model, and the destination group at the top in the sequence is taken as the recall result. And then sorting the recall results to obtain a final destination recommendation result. The method for generating the recall information for the tourist destination recommendation system improves the accuracy of generating the recall information
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.
Claims (9)
1. A method for generating recall information for a travel destination recommendation system, comprising the steps of:
acquiring user history information and destination information, constructing user characteristics according to the user history information, and constructing destination characteristics according to the destination information;
constructing a training sample according to the user characteristic and the destination characteristic;
training an XGboost model according to the training samples to obtain a sequencing model;
and generating recall information according to the sequencing model.
2. The method of generating recall information for a travel destination recommendation system of claim 1 wherein the user characteristics comprise user base characteristics comprising at least one of age, gender, and price sensitivity.
3. The method of generating recall information for a travel destination recommendation system of claim 2 wherein the user characteristics further comprise user behavior characteristics comprising at least one of an average amount of orders in a recent year, days of travel to browse travel products in a recent year.
4. The method of generating recall information for a travel destination recommendation system of claim 1 wherein the destination characteristics comprise destination label characteristics comprising at least one of forest, ski, ornamental, architectural, historical.
5. The method of generating recall information for a travel destination recommendation system of claim 4 wherein the destination characteristics further comprise destination ground characteristics including at least one of whether overseas, recommended number of days played.
6. The method of generating recall information for a travel destination recommendation system of claim 1 wherein the step of constructing a training sample based on the user characteristics and the destination characteristics further comprises:
and performing data cleaning on the user characteristics and the destination characteristics, and constructing a training sample according to the cleaned user characteristics and the destination characteristics.
7. The method of generating recall information for a travel destination recommendation system of claim 6 wherein the data cleansing comprises at least one of outlier detection, constant variable culling, feature discretization, and categorical variable processing.
8. The method of generating recall information for a travel destination recommendation system of claim 1 wherein the loss function of the XGBoost model is defined as:
wherein the content of the first and second substances,for a conventional loss function, T represents the number of leaf nodes, γ is a first parameter,
9. The method of generating recall information for a travel destination recommendation system of claim 8 wherein the XGBoost model is based on a PairWise ranking algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911266312.6A CN111104614A (en) | 2019-12-11 | 2019-12-11 | Method for generating recall information for tourist destination recommendation system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911266312.6A CN111104614A (en) | 2019-12-11 | 2019-12-11 | Method for generating recall information for tourist destination recommendation system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111104614A true CN111104614A (en) | 2020-05-05 |
Family
ID=70421718
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911266312.6A Pending CN111104614A (en) | 2019-12-11 | 2019-12-11 | Method for generating recall information for tourist destination recommendation system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111104614A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113298595A (en) * | 2020-07-30 | 2021-08-24 | 阿里巴巴集团控股有限公司 | Method and device for providing data object information and electronic equipment |
CN113360788A (en) * | 2021-05-07 | 2021-09-07 | 深圳依时货拉拉科技有限公司 | Address recommendation method, device, equipment and storage medium |
CN115062184A (en) * | 2022-06-29 | 2022-09-16 | 四川长虹电器股份有限公司 | Film sequencing method in voice recall scene |
CN115062184B (en) * | 2022-06-29 | 2024-05-28 | 四川长虹电器股份有限公司 | Film ordering method under voice recall scene |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108537373A (en) * | 2018-03-27 | 2018-09-14 | 黄晓鸣 | Travel information recommends method and apparatus |
CN108536650A (en) * | 2018-04-03 | 2018-09-14 | 北京京东尚科信息技术有限公司 | Generate the method and apparatus that gradient promotes tree-model |
CN110084630A (en) * | 2019-03-05 | 2019-08-02 | 浙江工业大学之江学院 | The user's tourism trip intention and type prediction method of decision tree are promoted based on gradient |
-
2019
- 2019-12-11 CN CN201911266312.6A patent/CN111104614A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108537373A (en) * | 2018-03-27 | 2018-09-14 | 黄晓鸣 | Travel information recommends method and apparatus |
CN108536650A (en) * | 2018-04-03 | 2018-09-14 | 北京京东尚科信息技术有限公司 | Generate the method and apparatus that gradient promotes tree-model |
CN110084630A (en) * | 2019-03-05 | 2019-08-02 | 浙江工业大学之江学院 | The user's tourism trip intention and type prediction method of decision tree are promoted based on gradient |
Non-Patent Citations (4)
Title |
---|
周挺等: "基于改进LightGBM的电力系统暂态稳定评估方法", 《电网技术》 * |
宋国琴等: "基于XGBoost特征选择的幕课翘课指数建立及应用", 《电子科技大学学报》 * |
许辉等: "基于初始均值点离散化的改进K-means算法", 《辽宁科技大学学报》 * |
黄金超等: "基于偏好度特征构造的个性化推荐算法", 《上海交通大学学报》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113298595A (en) * | 2020-07-30 | 2021-08-24 | 阿里巴巴集团控股有限公司 | Method and device for providing data object information and electronic equipment |
CN113360788A (en) * | 2021-05-07 | 2021-09-07 | 深圳依时货拉拉科技有限公司 | Address recommendation method, device, equipment and storage medium |
CN115062184A (en) * | 2022-06-29 | 2022-09-16 | 四川长虹电器股份有限公司 | Film sequencing method in voice recall scene |
CN115062184B (en) * | 2022-06-29 | 2024-05-28 | 四川长虹电器股份有限公司 | Film ordering method under voice recall scene |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107944986B (en) | Method, system and equipment for recommending O2O commodities | |
CN107657267B (en) | Product potential user mining method and device | |
CN110288484B (en) | Insurance classification user recommendation method and system based on big data platform | |
CN102609523A (en) | Collaborative filtering recommendation algorithm based on article sorting and user sorting | |
CN105069470A (en) | Classification model training method and device | |
CN107918657A (en) | The matching process and device of a kind of data source | |
CN112598438A (en) | Outdoor advertisement recommendation system and method based on large-scale user portrait | |
CN110737805B (en) | Method and device for processing graph model data and terminal equipment | |
CN115934990B (en) | Remote sensing image recommendation method based on content understanding | |
CN112085525A (en) | User network purchasing behavior prediction research method based on hybrid model | |
CN113159881B (en) | Data clustering and B2B platform customer preference obtaining method and system | |
CN111582538A (en) | Community value prediction method and system based on graph neural network | |
CN111523055A (en) | Collaborative recommendation method and system based on agricultural product characteristic attribute comment tendency | |
CN116304299A (en) | Personalized recommendation method integrating user interest evolution and gradient promotion algorithm | |
CN111104614A (en) | Method for generating recall information for tourist destination recommendation system | |
Herdiyeni et al. | Chilli quality classification using deep learning | |
CN111429161B (en) | Feature extraction method, feature extraction device, storage medium and electronic equipment | |
CN114861050A (en) | Feature fusion recommendation method and system based on neural network | |
CN111815413A (en) | Big data commodity prediction system and method based on hot event | |
CN110674964A (en) | Search prediction system and method based on agricultural traceability information | |
CN111078859B (en) | Author recommendation method based on reference times | |
CN111723302A (en) | Recommendation method based on collaborative dual-model deep representation learning | |
CN113837266B (en) | Software defect prediction method based on feature extraction and Stacking ensemble learning | |
CN111435514A (en) | Feature calculation method and device, sorting method and device, and storage medium | |
CN115809376A (en) | Intelligent recommendation method based on big teaching data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200505 |