CN111104614A - Method for generating recall information for tourist destination recommendation system - Google Patents

Method for generating recall information for tourist destination recommendation system Download PDF

Info

Publication number
CN111104614A
CN111104614A CN201911266312.6A CN201911266312A CN111104614A CN 111104614 A CN111104614 A CN 111104614A CN 201911266312 A CN201911266312 A CN 201911266312A CN 111104614 A CN111104614 A CN 111104614A
Authority
CN
China
Prior art keywords
destination
user
information
recommendation system
recall information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911266312.6A
Other languages
Chinese (zh)
Inventor
李明
江文斌
李健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhilv Information Technology Co Ltd
Original Assignee
Shanghai Zhilv Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhilv Information Technology Co Ltd filed Critical Shanghai Zhilv Information Technology Co Ltd
Priority to CN201911266312.6A priority Critical patent/CN111104614A/en
Publication of CN111104614A publication Critical patent/CN111104614A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for generating recall information for a tourist destination recommendation system, which comprises the following steps: acquiring user history information and destination information, constructing user characteristics according to the user history information, and constructing destination characteristics according to the destination information; constructing a training sample according to the user characteristic and the destination characteristic; training the XGboost model according to the training samples to obtain a sequencing model; recall information is generated according to the ranking model. The invention improves the accuracy of generating the recall information.

Description

Method for generating recall information for tourist destination recommendation system
Technical Field
The invention belongs to the technical field of travel information recommendation, and particularly relates to a method for generating recall information for a tourist destination recommendation system.
Background
With the development of information technology and the internet, people gradually move from the times of lacking information to the times of information overload. It is a difficult problem that consumers want to find information of interest from a large amount of information (articles), and information producers want to stand out and pay attention to the information produced by the consumers. The recommendation system builds a bridge between the information producer and the consumer to establish connection, helps the user to solve the problem of information overload, and also helps the merchant to find the target user. The recommendation system needs to explore the behaviors of the users and find the personalized requirements of the users, so that the commodities are accurately recommended to the users needing the recommendation system.
It takes a lot of time and effort for the user who needs to schedule the travel plan to evaluate the collected destination information. The tourist destination recommending system learns the preference of the user by analyzing the user attribute and behavior and finds the destination matched with the interest point required by the user. In fact, travel destination recommendations are more complex than recommendations in other areas. Because the number of the evaluations of the tourist destinations by the users is small, the data are more sparse. Moreover, the recommendation of the tourist destination is easily affected by factors such as season, traffic, cost, etc., so that it is difficult to construct a recommendation model of the tourist destination which integrates many valuable factors.
The basic recommendation framework can be mainly divided into a recall part and a sorting part. In the recalling stage, a small part of candidate sets which are possibly interested by the user are obtained from the full-amount commodity library, which is equivalent to rough sorting, and in the sorting stage, the candidate sets obtained in the recalling stage are sorted accurately and recommended to the user.
In the prior art, the recall accuracy is low.
Disclosure of Invention
The invention aims to overcome the defect of low recall accuracy in the travel information recommendation process in the prior art, and provides a recall information generation method for a tourist destination recommendation system.
The invention solves the technical problems through the following technical scheme:
the invention provides a method for generating recall information for a tourist destination recommendation system, which comprises the following steps:
acquiring user history information and destination information, constructing user characteristics according to the user history information, and constructing destination characteristics according to the destination information;
constructing a training sample according to the user characteristic and the destination characteristic;
training an XGboost (a model) model according to training samples to obtain a sequencing model;
recall information is generated according to the ranking model.
Preferably, the user characteristics include user base characteristics including at least one of age, gender, and price sensitivity.
Preferably, the user characteristics further include user behavior characteristics including at least one of an average amount of orders in a recent year, days of travel to browse travel products in a recent year.
Preferably, the destination characteristics comprise destination label characteristics comprising at least one of forest, ski, ornamental, building, history.
Preferably, the destination characteristics further include destination ground characteristics including at least one of whether overseas, recommended number of days played.
Preferably, the step of constructing training samples according to the user characteristics and the destination characteristics further comprises:
and carrying out data cleaning on the user characteristics and the destination characteristics, and constructing a training sample according to the cleaned user characteristics and the destination characteristics.
Preferably, the data cleaning includes at least one of outlier detection, constant variable culling, feature discretization, and categorical variable processing.
Preferably, the loss function of the XGBoost model is defined as:
Figure BDA0002312948670000021
wherein the content of the first and second substances,
Figure BDA0002312948670000022
for a conventional loss function, T represents the number of leaf nodes, γ is a first parameter,
Figure BDA0002312948670000031
as a regular term, ω represents the fraction of the leaf node and λ is the second parameter.
Preferably, the XGBoost model is based on a PairWise ordering algorithm.
The positive progress effects of the invention are as follows: the invention improves the accuracy of generating the recall information.
Drawings
FIG. 1 is a flow chart of a method for generating recall information for a travel destination recommendation system in accordance with a preferred embodiment of the present invention.
Detailed Description
The present invention is further illustrated by the following preferred embodiments, but is not intended to be limited thereby.
The present embodiment provides a generation method of recall information for a travel destination recommendation system. Referring to fig. 1, the method for generating recall information for a travel destination recommendation system includes the steps of:
step S101, obtaining user history information and destination information, constructing user characteristics according to the user history information, and constructing destination characteristics according to the destination information.
And S102, constructing a training sample according to the user characteristics and the destination characteristics.
And S103, training the XGboost model according to the training samples to obtain a sequencing model.
And step S104, generating recall information according to the sequencing model.
In specific implementation, in step S101, user history information and destination information are acquired, a user characteristic is constructed according to the user history information, and a destination characteristic is constructed according to the destination information. The user characteristics include user base characteristics including at least one of age, gender, and price sensitivity. The user characteristics further include user behavior characteristics including at least one of an average amount of orders in a recent year, days of travel to browse travel products in a recent year. The destination characteristics include destination label characteristics including at least one of forest, ski, ornamental, building, history. The destination characteristics further include a destination base characteristic including at least one of whether overseas, recommended number of days played.
In step S102, data cleaning is performed on the user feature and the destination feature, and a training sample is constructed according to the cleaned user feature and the destination feature. The data cleaning comprises at least one of outlier detection, constant variable elimination, feature discretization and type variable processing. Outlier detection: sample points that deviate from the overall distribution of the samples are deleted. Removing the constant variables: features with standard deviation close to the constant variable are considered as invalid features at the time of model training and need to be removed. Characteristic discretization: and segmenting the continuous data into a segment of discretization interval. The segmentation principle includes equal-step division, equal-frequency division, clustering-based division or artificial division. For example, the characteristic "age" is partitioned into [0-17], [18-23], [24-35], [36-49], [50-100] based on a priori knowledge of age awareness and population age distribution using portable APP. Processing category type variables: mainly comprising serial number coding and one-hot coding. For example, the characteristic "age" is discretized and then changed into a category variable, and after the category variable is coded, the values corresponding to the five categories are respectively: 0. 1, 2, 3 and 4. And for the categorical variables without the size relationship, adopting one-hot coding. For example, the characteristic "sex" takes values of "male" and "female", and after one-hot encoding, "male" corresponds to (1,0) and "female" corresponds to (0, 1).
And then screening part of the discrete features from the features as grouping features. The main features of the user grouping include: age, price sensitivity, etc. The main features for destination packets include: country of belongings, subject label, etc. And then grouping the users in an aggregation mode according to the equal grouping characteristic values, and dividing the users with completely equal grouping characteristics into the same group. Other features take the mean of the features of users within the same group as the group feature. Each user group has a unique corresponding group identification user _ class _ id (for characterizing the group identification). The same principle is applied to the destination grouping, and each user group has a unique corresponding group identification poi _ class _ id (used for representing the group identification).
The training sample is mainly constructed by marking the strength of the travel will of the user group on the destination group as the label (will value) of the sample. The number of training samples is taken within a day for users who have a search record on the travel APP. The specific labeling strategy is as follows: and counting the times of point searching and order placing of the user group on the destination group, and adding the times according to the weight of 3:7 to obtain a willingness score. And 5, comparatively discretizing the willingness level into 5 types as the travel willingness value of the user group to the destination group. The construction of one sample is shown in the following table (where features denote features):
user_class_id poi_class_id features label
the test samples were taken for data within one day. And counting the proportion of the recalled destination hitting the lower order destination as a measurement index.
The ranking algorithm for the recall method of the travel destination recommendation system uses the PairWise-based XGboost model. XGBoost is one of boosting algorithms, the idea being to integrate many weak classifiers together to form a strong classifier. The XGboost is a lifting tree model, and a tree is generated by continuously adding tree models and continuously performing feature splitting; every new tree is generated, and in fact a new function is learned to fit the residual of the last round of prediction. And obtaining K subtrees after training is finished, wherein each sample falls to a corresponding leaf node in each tree, each leaf node corresponds to a score, and the final predicted value of the sample is the sum of the scores of the leaf nodes of each corresponding tree. The tree model used is a CART (a classification tree) regression tree model. The CART regression tree is a binary tree, and the sample space is divided by continuously splitting the features.
The XGboost penalty function is defined as:
Figure BDA0002312948670000051
wherein the content of the first and second substances,
Figure BDA0002312948670000052
the method comprises the following steps that (1) T represents the number of leaf nodes and gamma is a first parameter and is a fraction value used for controlling the number of CART trees in a conventional loss function;
Figure BDA0002312948670000053
and the regularization term is used for limiting the complexity of the model and reducing the risk of overfitting, wherein omega represents the fraction of the leaf node, and lambda is a second parameter and is a fraction value of the leaf node used for controlling the CART tree.
XGboost adopts a mode of an additive model, and a loss function after formalization is as follows:
Figure BDA0002312948670000054
in XGBoost, the gradient is obtained by performing a second-order taylor expansion on the loss function, which is shown as follows:
Figure BDA0002312948670000061
the ranking algorithms can be basically classified into three major categories: PointWise (a sort algorithm), PairWise, ListWise (a sort algorithm). PointWise: only the absolute relevance of a single query result under a given query is considered, and the relevance of other query results and the given query is not considered. PairWise considers the relative relevance between any two query results with different relevance. Listwise: and directly considering the whole sequence of the query result set under the given query, and directly optimizing the query result sequence output by the model to enable the query result sequence to be as close to the real query result sequence as possible. After the accuracy and complexity of the model are considered comprehensively, the XGboost model based on PairWise is selected to generate the ranking model by the method for generating the recall information of the tourist destination recommendation system.
After the ranking model is generated, recall information is generated using the ranking model.
By adopting the method for generating the recall information for the tourist destination recommendation system of the embodiment, for each user entering the tourist destination recommendation system, the characteristics of the user are obtained according to the historical information of the user, the user group to which the user belongs is judged according to the characteristics, the corresponding destination group sequence is obtained through the sequencing model, and the destination group at the top in the sequence is taken as the recall result. And then sorting the recall results to obtain a final destination recommendation result. The method for generating the recall information for the tourist destination recommendation system improves the accuracy of generating the recall information
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.

Claims (9)

1. A method for generating recall information for a travel destination recommendation system, comprising the steps of:
acquiring user history information and destination information, constructing user characteristics according to the user history information, and constructing destination characteristics according to the destination information;
constructing a training sample according to the user characteristic and the destination characteristic;
training an XGboost model according to the training samples to obtain a sequencing model;
and generating recall information according to the sequencing model.
2. The method of generating recall information for a travel destination recommendation system of claim 1 wherein the user characteristics comprise user base characteristics comprising at least one of age, gender, and price sensitivity.
3. The method of generating recall information for a travel destination recommendation system of claim 2 wherein the user characteristics further comprise user behavior characteristics comprising at least one of an average amount of orders in a recent year, days of travel to browse travel products in a recent year.
4. The method of generating recall information for a travel destination recommendation system of claim 1 wherein the destination characteristics comprise destination label characteristics comprising at least one of forest, ski, ornamental, architectural, historical.
5. The method of generating recall information for a travel destination recommendation system of claim 4 wherein the destination characteristics further comprise destination ground characteristics including at least one of whether overseas, recommended number of days played.
6. The method of generating recall information for a travel destination recommendation system of claim 1 wherein the step of constructing a training sample based on the user characteristics and the destination characteristics further comprises:
and performing data cleaning on the user characteristics and the destination characteristics, and constructing a training sample according to the cleaned user characteristics and the destination characteristics.
7. The method of generating recall information for a travel destination recommendation system of claim 6 wherein the data cleansing comprises at least one of outlier detection, constant variable culling, feature discretization, and categorical variable processing.
8. The method of generating recall information for a travel destination recommendation system of claim 1 wherein the loss function of the XGBoost model is defined as:
Figure FDA0002312948660000021
wherein the content of the first and second substances,
Figure FDA0002312948660000022
for a conventional loss function, T represents the number of leaf nodes, γ is a first parameter,
Figure FDA0002312948660000023
as a regular term, ω represents the fraction of the leaf node and λ is the second parameter.
9. The method of generating recall information for a travel destination recommendation system of claim 8 wherein the XGBoost model is based on a PairWise ranking algorithm.
CN201911266312.6A 2019-12-11 2019-12-11 Method for generating recall information for tourist destination recommendation system Pending CN111104614A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911266312.6A CN111104614A (en) 2019-12-11 2019-12-11 Method for generating recall information for tourist destination recommendation system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911266312.6A CN111104614A (en) 2019-12-11 2019-12-11 Method for generating recall information for tourist destination recommendation system

Publications (1)

Publication Number Publication Date
CN111104614A true CN111104614A (en) 2020-05-05

Family

ID=70421718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911266312.6A Pending CN111104614A (en) 2019-12-11 2019-12-11 Method for generating recall information for tourist destination recommendation system

Country Status (1)

Country Link
CN (1) CN111104614A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298595A (en) * 2020-07-30 2021-08-24 阿里巴巴集团控股有限公司 Method and device for providing data object information and electronic equipment
CN113360788A (en) * 2021-05-07 2021-09-07 深圳依时货拉拉科技有限公司 Address recommendation method, device, equipment and storage medium
CN115062184A (en) * 2022-06-29 2022-09-16 四川长虹电器股份有限公司 Film sequencing method in voice recall scene
CN115062184B (en) * 2022-06-29 2024-05-28 四川长虹电器股份有限公司 Film ordering method under voice recall scene

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537373A (en) * 2018-03-27 2018-09-14 黄晓鸣 Travel information recommends method and apparatus
CN108536650A (en) * 2018-04-03 2018-09-14 北京京东尚科信息技术有限公司 Generate the method and apparatus that gradient promotes tree-model
CN110084630A (en) * 2019-03-05 2019-08-02 浙江工业大学之江学院 The user's tourism trip intention and type prediction method of decision tree are promoted based on gradient

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537373A (en) * 2018-03-27 2018-09-14 黄晓鸣 Travel information recommends method and apparatus
CN108536650A (en) * 2018-04-03 2018-09-14 北京京东尚科信息技术有限公司 Generate the method and apparatus that gradient promotes tree-model
CN110084630A (en) * 2019-03-05 2019-08-02 浙江工业大学之江学院 The user's tourism trip intention and type prediction method of decision tree are promoted based on gradient

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
周挺等: "基于改进LightGBM的电力系统暂态稳定评估方法", 《电网技术》 *
宋国琴等: "基于XGBoost特征选择的幕课翘课指数建立及应用", 《电子科技大学学报》 *
许辉等: "基于初始均值点离散化的改进K-means算法", 《辽宁科技大学学报》 *
黄金超等: "基于偏好度特征构造的个性化推荐算法", 《上海交通大学学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298595A (en) * 2020-07-30 2021-08-24 阿里巴巴集团控股有限公司 Method and device for providing data object information and electronic equipment
CN113360788A (en) * 2021-05-07 2021-09-07 深圳依时货拉拉科技有限公司 Address recommendation method, device, equipment and storage medium
CN115062184A (en) * 2022-06-29 2022-09-16 四川长虹电器股份有限公司 Film sequencing method in voice recall scene
CN115062184B (en) * 2022-06-29 2024-05-28 四川长虹电器股份有限公司 Film ordering method under voice recall scene

Similar Documents

Publication Publication Date Title
CN107944986B (en) Method, system and equipment for recommending O2O commodities
CN107657267B (en) Product potential user mining method and device
CN110288484B (en) Insurance classification user recommendation method and system based on big data platform
CN102609523A (en) Collaborative filtering recommendation algorithm based on article sorting and user sorting
CN105069470A (en) Classification model training method and device
CN107918657A (en) The matching process and device of a kind of data source
CN112598438A (en) Outdoor advertisement recommendation system and method based on large-scale user portrait
CN110737805B (en) Method and device for processing graph model data and terminal equipment
CN115934990B (en) Remote sensing image recommendation method based on content understanding
CN112085525A (en) User network purchasing behavior prediction research method based on hybrid model
CN113159881B (en) Data clustering and B2B platform customer preference obtaining method and system
CN111582538A (en) Community value prediction method and system based on graph neural network
CN111523055A (en) Collaborative recommendation method and system based on agricultural product characteristic attribute comment tendency
CN116304299A (en) Personalized recommendation method integrating user interest evolution and gradient promotion algorithm
CN111104614A (en) Method for generating recall information for tourist destination recommendation system
Herdiyeni et al. Chilli quality classification using deep learning
CN111429161B (en) Feature extraction method, feature extraction device, storage medium and electronic equipment
CN114861050A (en) Feature fusion recommendation method and system based on neural network
CN111815413A (en) Big data commodity prediction system and method based on hot event
CN110674964A (en) Search prediction system and method based on agricultural traceability information
CN111078859B (en) Author recommendation method based on reference times
CN111723302A (en) Recommendation method based on collaborative dual-model deep representation learning
CN113837266B (en) Software defect prediction method based on feature extraction and Stacking ensemble learning
CN111435514A (en) Feature calculation method and device, sorting method and device, and storage medium
CN115809376A (en) Intelligent recommendation method based on big teaching data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200505