CN111339423A - User-based travel city pushing method, system, equipment and storage medium - Google Patents

User-based travel city pushing method, system, equipment and storage medium Download PDF

Info

Publication number
CN111339423A
CN111339423A CN202010142415.8A CN202010142415A CN111339423A CN 111339423 A CN111339423 A CN 111339423A CN 202010142415 A CN202010142415 A CN 202010142415A CN 111339423 A CN111339423 A CN 111339423A
Authority
CN
China
Prior art keywords
city
user
vector
data
cities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010142415.8A
Other languages
Chinese (zh)
Other versions
CN111339423B (en
Inventor
陆佳星
胡泓
郭宝坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ctrip Computer Technology Shanghai Co Ltd
Original Assignee
Ctrip Computer Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ctrip Computer Technology Shanghai Co Ltd filed Critical Ctrip Computer Technology Shanghai Co Ltd
Priority to CN202010142415.8A priority Critical patent/CN111339423B/en
Publication of CN111339423A publication Critical patent/CN111339423A/en
Application granted granted Critical
Publication of CN111339423B publication Critical patent/CN111339423B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/14Travel agencies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/55Push-based network services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Finance (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a user-based tourist city pushing method, a system, equipment and a storage medium, wherein the method comprises the following steps: counting respective frequency data R of each city and preset n types of subject labelslObtaining TF-IDF values of different themes of each city according to the frequency data of each city and the n-type theme labels to obtain n-dimensional vectors; obtaining the weighted average of the user and city vector calculation according to the user order, aggregating the labels of the city placing the order in each month to obtain frequency data, and calculating the vector data TF-IDF of the monthmFor separate calculationSimilarity data between users and months, between users and cities, and between months and cities; randomly sampling and matching corresponding vectors and similarity data from historical order data; and predicting the reserved city of the user, and recommending the predicted city to the user. The invention can carry out modeling according to the information of the user, the city and the time, and carry out sequencing and recommendation on the city according to the booking probability.

Description

User-based travel city pushing method, system, equipment and storage medium
Technical Field
The invention relates to the field of travel information pushing, in particular to a user-based travel city pushing method, system, equipment and storage medium.
Background
With the increasing living standard of people, domestic or overseas traveling becomes an option for more and more users to be leisure. But travel options for leisure purposes tend to go through a longer decision period. For users, on one hand, the cities which can be selected are numerous, and if the details of each city are known, the time is usually much spent; on the other hand, different users may already have relatively fixed selection preferences, but suffer from the inability to quickly find similar travel destinations. If the OTA can help the user to quickly find the destination of the mental apparatus and match with a proper recommendation reason and hotel display, the labor intensity of the user can be greatly reduced, the decision of the user is promoted, and the booking experience of the user is improved.
OTA has already had much exploration in destination recommendations in order to help users find travel destinations faster. At present, the mainstream modes are ranking recommendation based on search popularity, type recommendation based on fixed tags and city recommendation based on recent clicks of users. These recommendations, while relatively simple to operate, have some drawbacks: the popularity recommendation is to sort the cities according to search results of a history for a period of time, but statistical results based on a large number of users are stable, and popular and similar contents may be displayed all the time. And the recommendation based on the labels is to show the same subject to the user by combining the cities with the same labels, but because the subjects are many, the user may have difficulty in finding the content needed by the user at the first time. Based on the recommendation recently clicked by the user, the city recently searched and browsed by the user is set to generate a corresponding search list, and although the mode ensures that the city is the destination in which the user is interested to a great extent, the mode only plays a role in recording historical browsing to a greater extent, and cannot push more attractive travel products for the user, so that the order conversion rate of the travel products is difficult to improve.
Therefore, the invention provides a user-based tourist city pushing method, system, device and storage medium.
Disclosure of Invention
The invention aims to provide a user-based tourist city pushing method, a system, equipment and a storage medium, wherein the user-based tourist city pushing method, the system, the equipment and the storage medium can obtain the booking probability of a user by modeling the city entering a recommendation list according to user, city and time information, and sort and recommend the city according to the booking probability.
The embodiment of the invention provides a user-based tourist city pushing method, which comprises the following steps:
s101, counting respective frequency data R of each city and preset n types of subject labelsl,l∈[1,n]N is a natural number greater than 1;
s102, obtaining TF-IDF values of different themes of each city according to frequency data of each city and the n types of theme labels;
s103, arranging each city according to the TF-IDF values of the n types of themes and the same theme sequence to obtain an n-dimensional vector;
s104, obtaining a weighted average of the user and the city vector according to the order of the historical tourist city of each user, and taking the weighted average as a user vector;
s105, aggregating the labels of the order-placing cities of each month to obtain frequency data corresponding to the theme of each month, and calculating the vector data TF-IDF of the monthm,m∈[1,12]M is a natural number;
s106, respectively calculating similarity data between users and months, between users and cities and between months and cities as training dimensions;
s107, randomly sampling from historical order data, and matching corresponding vector and similarity data according to the user, month and city ordered of each order;
s108, predicting the reserved city of the user;
s109, recommending at least one city with the highest predicted user booking probability to the user according to the vector data of the user.
Preferably, the theme tags include 18 tags in total for charm, landscape such as painting, shopping paradise, artistic interest, food holy, green oxygen, SPA, seashore scene, romantic feeling, water exploration, historical relics, outdoor sports, desert landscape, city landscape, ice and snow sky, time of day, theme park, and travel with friends.
Preferably, the process of obtaining the TF-IDF value in step S102 includes:
s1021, setting the number of cities of each type of theme as Sl,l∈[1,n]Counting each SlA value of (d);
s1022, setting the total number of cities as S, and obtaining the IDF of each type of subjectlValue, IDFl=log[S÷(Sl+1)],l∈[1,n];
S1023, obtaining TF-IDF of each type of subjectlValue, TF-IDFl=Rl×IDFl,l∈[1,n]。
Preferably, the step S104 includes:
distinguishing orders of the user in the past three months, the orders in the past three months and other orders, setting the corresponding weight of the orders in the past three months as 1, the corresponding weight of the orders in the past three months as 0.8 and the corresponding weight of the other orders as 0.5, and then calculating a weighted average for the order city vector.
Preferably, the step S106 includes:
adopting a cosine similarity calculation method, setting a user as a and a month as b:
Figure BDA0002399546090000031
wherein, XiAnd YiCorresponding to the ith values of the two vectors being computed.
Preferably, the step S107 further includes creating vector and similarity data by adding at least one of historical order price, city star rating, user price sensitivity, city sales, city ranking, and total times of browsing the city by the user.
Preferably, the step S108 includes:
s1081, excluding cities reserved by the user in the previous 3 months;
s1082, marking the ordering city corresponding to the order as 1 as a positive sample, marking the rest cities as 0 as negative samples, and keeping the proportion of the positive samples to the negative samples as 1: 20;
s1083, training and testing are conducted through the XGBOOST algorithm, and the booking probability score of each user for each city is obtained.
Preferably, step S109 includes: and selecting P labels with the highest user probability, wherein P is a natural number, and recommending the city which accords with the P labels and has the highest user subscription probability according to the P labels.
The embodiment of the present invention further provides a user-based tourist city push system, which is used for implementing the user-based tourist city push method, and the user-based tourist city push system includes:
the frequency acquisition module is used for counting respective frequency data R of each city and preset n types of subject labelsl,l∈[1,n]N is a natural number greater than 1;
the frequency statistics module is used for obtaining TF-IDF values of different themes of each city according to the frequency data of each city and the n types of theme labels;
the vector establishing module is used for arranging each city according to the TF-IDF values of the n types of themes and the same theme sequence to obtain an n-dimensional vector;
the user vector module is used for obtaining a weighted average of the user and the city vector according to the order of the historical tourist city of each user and taking the weighted average as a user vector;
the month vector module is used for aggregating the labels of the order-placing cities of each month to obtain frequency data corresponding to the theme of each month, and calculating the vector data TF-IDF of the monthm,m∈[1,12]M is a natural number;
the training dimension module is used for calculating similarity data between users and months, between users and cities and between months and cities respectively to serve as training dimensions;
the random sampling module is used for randomly sampling from historical order data and matching corresponding vector and similarity data according to the user, month and order city of each order;
the user prediction module predicts the booking city of the user;
and the user recommending module is used for recommending at least one city with the highest predicted user booking probability to the user according to the vector data of the user.
The embodiment of the invention also provides a user-based tourist city pushing device, which comprises:
a processor;
a memory having stored therein executable instructions of the processor;
wherein the processor is configured to perform the steps of the user-based travel city push method described above via execution of the executable instructions.
Embodiments of the present invention also provide a computer-readable storage medium storing a program that, when executed, performs the steps of the user-based travel city push method described above.
The invention aims to provide a user-based tourist city pushing method, a user-based tourist city pushing system, a user-based tourist city pushing device and a user-based tourist city pushing storage medium.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, with reference to the accompanying drawings.
FIG. 1 is a flow chart of a user-based travel city push method of the present invention.
Fig. 2 is a schematic block diagram of the user-based travel city push system of the present invention.
Fig. 3 is a schematic structural diagram of the user-based travel city push apparatus of the present invention. And
fig. 4 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar structures, and thus their repetitive description will be omitted.
FIG. 1 is a flow chart of a user-based travel city push method of the present invention. As shown in fig. 1, an embodiment of the present invention provides a user-based travel city push method, including the following steps:
s101, counting respective frequency data R of each city and preset n types of subject labelsl,l∈[1,n]And n is a natural number greater than 1.
S102, obtaining TF-IDF values of different themes of each city according to the frequency data of each city and the n types of theme labels.
S103, arranging each city according to the TF-IDF values of the n types of subjects and the same subject sequence to obtain an n-dimensional vector.
And S104, obtaining the weighted average of the user and the city vector according to the order of the historical tourist city of each user, and taking the weighted average as the user vector.
S105, aggregating the labels of the order-placing cities of each month to obtain frequency data corresponding to the theme of each month, and calculating the vector data TF-IDF of the monthm,m∈[1,12]And m is a natural number.
S106, calculating similarity data between users and months, between users and cities and between months and cities respectively to serve as training dimensions.
S107, randomly sampling from historical order data, and matching corresponding vector and similarity data according to the user, month and city ordered of each order.
And S108, predicting the reserved city of the user.
S109, recommending at least one city with the highest predicted user booking probability to the user according to the vector data of the user.
In a preferred embodiment, the theme tags include 18 tags in total for charm, landscape such as painting, shopping paradise, artistic interest, food holy, green oxygen, SPA, seashore scene, romantic feeling, water exploration, historical relics, outdoor sports, desert landscape, city landscape, ice and snow sky, parent-child time, theme park, and fellow friends.
In a preferred embodiment, the process of obtaining the TF-IDF value in step S102 includes:
s1021, setting the number of cities of each type of theme as Sl,l∈[1,n]Counting each SlThe value of (c).
S1022, setting the total number of cities as S, and obtaining the IDF of each type of subjectlValue, IDFl=log[S÷(Sl+1)],l∈[1,n]。
S1023, obtaining TF-IDF of each type of subjectlValue, TF-IDFl=Rl×IDFl,l∈[1,n]。
In a preferred embodiment, step S104 includes:
distinguishing orders of the user in the past three months, the orders in the past three months and other orders, setting the corresponding weight of the orders in the past three months as 1, the corresponding weight of the orders in the past three months as 0.8 and the corresponding weight of the other orders as 0.5, and then calculating a weighted average for the order city vector.
In a preferred embodiment, step S106 includes:
adopting a cosine similarity calculation method, setting a user as a and a month as b:
Figure BDA0002399546090000061
wherein, XiAnd YiCorresponding to the ith values of the two vectors being computed.
In a preferred embodiment, step S107 further comprises creating vector and similarity data by adding at least one of historical order price, city star rating, user price sensitivity, city sales, city ranking, and total number of times the user browses the city.
In a preferred embodiment, step S108 includes:
s1081, excluding cities that the user has subscribed to in the previous 3 months.
S1082, marking the ordering city corresponding to the order as 1 as a positive sample, marking the rest cities as 0 as negative samples, and keeping the proportion of the positive samples to the negative samples as 1: 20.
s1083, training and testing are conducted through the XGBOOST algorithm, and the booking probability score of each user for each city is obtained.
In a preferred embodiment, step S109 includes: and selecting P labels with the highest user probability, wherein P is a natural number, and recommending the city which accords with the P labels and has the highest user subscription probability.
The invention aims to provide a city recommendation system based on user historical behavior information, and a user can conveniently obtain city recommendation recommendations similar to theme preferences of the user through the system. And modeling the cities entering the recommendation list according to the user, the cities and the time information to obtain the booking probability of the user, and sequencing and recommending the cities according to the booking probability.
The invention is realized by the following technical scheme:
the invention comprises the following steps: the system comprises a city information module, a user information module, a time information module, a cross data module, a user ordering probability prediction module and a front-end display module. Wherein: the user information module is used for storing user data and ordering information. The city information module is used for storing each city vector and related data. The front-end display module is used for displaying the results calculated by other modules at the front end.
The city information module is used for storing city related information, such as city label vectors, sales volume ranking and the like, wherein the city vector information is calculated by TF-IDF.
In particular, TF-IDF is a frequency-based statistical method for evaluating the importance of a word to a document in the entire corpus. The importance of a word in its core algorithm increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus. By using the calculation method, the labels meeting higher occurrence frequency and having discrimination under each city can be found. The TF-IDF value of the label is expressed by decimal number, and different label values are arranged in sequence to form a vector corresponding to the city.
The user information module stores historical relevant information of the user, such as a label vector of the user, historical ordering price, star level, price sensitivity and the like.
The time information module stores time related information such as a label vector corresponding to the current date, a hot sale city and the like.
The cross data module is used for storing multi-dimensional cross information, such as user-time similarity, user-city similarity, the ratio of the city to the station in the closest point of the user, and the like. Wherein the similarity calculation adopts cosine similarity.
The user ordering probability prediction module acquires relevant data stored by each information module, models the ordering probability of the user to the city, and calls the XGBOOST model to predict whether the user orders in the city. And filtering to a certain extent according to the city label that each city accords with, and storing the result.
When a user enters the recommendation page, the front section display module calls the stored recommendation city result, and displays the recommendation city result on the App page according to the sequence by matching with the corresponding pictures and the recommendation reason.
The method and the system can meet the preferences of the user on different themes, provide some personalized city choices for the user, and can better improve the browsing and ordering experience of the user.
The method comprises a background calculation part and a front-end display part. Wherein the background computing part comprises 5 steps in total: 1) and calculating a city corresponding vector according to the label information of the city. 2) And obtaining the vectors of the months and the users, and calculating the similarity between every two of the cities, the users and the months. 3) And acquiring other related dimensions, and training a user city ordering probability prediction model. 4) And obtaining a recommended city list by using a probability model and sorting the recommended cities from high to low according to probability. 5) And filtering the cities which do not meet the requirements according to the high-probability labels of the users, and selecting the cities corresponding to the top 10. 6) When the user enters the city recommendation page, the front end calls the recommendation result and displays the recommendation result in cooperation with the corresponding picture and the recommendation reason.
The main steps of the invention are described below according to a flow scheme:
and according to the existing user tags and mapping relations, counting the frequency data of 18 major subjects in each city. The topics participating in the statistics are as follows:
Figure BDA0002399546090000081
Figure BDA0002399546090000091
after frequency data of different subjects of each city are obtained, an IDF value is calculated according to a formula, and then TF-IDF is calculated. The calculation formula is as follows:
Figure BDA0002399546090000092
TF-IDF (frequency of the subject) IDF
Each city is composed of TF-IDF values of different topics in order into an 18-dimensional vector.
Because the destinations of interest to the user may be reflected by their historical preferences and recent browsing, and may also be affected by time-season factors, vector data for the user and time may be constructed from city data.
And (3) user vector: and marking historical orders of the user, and distinguishing historical contemporaneous (three months) orders of the user, wherein the orders in the past three months are other orders. Respectively setting corresponding weights (1, 0.8 and 0.5), and then calculating a weighted average for the order city vector as a user vector
Time vector: and aggregating the labels of the city for placing the order in each month to obtain frequency data corresponding to the theme of each month, and calculating TF-IDF as vector data of the month.
And then respectively calculating similarity data among user-time, user-city and time-city as training dimensionality. The similarity adopts a cosine similarity calculation method, and the specific formula is as follows:
Figure BDA0002399546090000093
wherein XiAnd YiCorresponding to the ith values of the two vectors being computed.
Random sampling was performed from the order data of the past year, and about 100 ten thousand of data were selected, 80% of which were training data and the rest were test sets, ensuring that the order time of the test set was after the training set. And matching corresponding vector and similarity data according to the user, time and ordering city of each order. While some other relevant dimensions are constructed as an information supplement. Such as historical order price, star rating, price sensitivity, city sales, ranking, user browsing the city, etc.
And predicting the booking city of the user by using the relevant dimension constructed in the step. The total number of cities entering the recommended pool is 200, and the cities are TOP100 cities for domestic and overseas hot sales. Excluding cities that the user has subscribed to in the 3 months prior to the order time. And marking the ordering city corresponding to the order as 1, namely a positive sample, and marking the rest cities as 0, namely a negative sample. Because the proportion difference of the positive sample and the negative sample is overlarge and the data size required by training is large, the negative sample city of each order is sampled according to the city order probability, namely the higher the negative sample sales volume is, the higher the probability of being sampled is. And finally, ensuring that the positive-negative ratio of the training sample is about 1: 20. using binary in XGBOOST: the logistic method is trained and tested to obtain the booking probability score of each user for each city, wherein the higher the score, the higher the probability that the city is booked. The XGboost full-name (eXtremeGradient Boosting) extreme gradient is promoted, and the XGboost extreme gradient is often used in some competitions, so that the effect is obvious. The tool is a tool of a massively parallel booted tree, and is the fastest and best tool kit of the open source booted tree at present. The algorithm applied by the XGBoost is an improvement of the gbdt (binary boosting discislon tree), and can be used for classification and regression.
The TOP5 label with the highest probability of the user is selected based on the user's vector data, and only those labels are recommended to meet and the user has a higher probability of booking a city.A value of K is set as the threshold for whether the city meets the label, and if and only if the city corresponds to a label vector value greater than K, the city is considered to enter the recommended candidate pool for the user's subject label.A value of K is obtained by calculating the number of TF-IDF value scores under different labels.A TOP10 city with the highest probability of meeting the condition under the user TOP1 label is first calculated as the recommended city to which the TOP1 label corresponds.A TOP2 label is then calculated since the same city may meet both the TOP1 label and the TOP 23 label and with higher probabilities, then if each label is ranked by probability, the recommended list under different cities may have a large degree of repetition.A decay coefficient of decay α is experimentally determined.A probability value of the highest probability under the TOP2 label is obtained based on the probability value of the decay 735, and then the probability of the recommended city 8296 is stored for each city.
Among them, the Xgboost algorithm is an ensemble learning algorithm widely used in the industry. Here to help predict the predetermined probability score for each candidate set. In the algorithm process, a tree is grown by continuously performing feature splitting, and the residual error between the predicted value and the actual value of the model in the previous round is fitted in each round of learning. When the training is finished, adding the scores of all the obtained tree models to obtain a final prediction score:
Figure BDA0002399546090000111
where K represents the total number of trees learned, fk(xi) Is an expression per tree, which is actually based on training set samples xiAs a function of (c). Predicting an objective
Figure BDA0002399546090000113
The orders for the training set are labeled.
The training purpose of each tree is to minimize the loss function. The loss function is:
Figure BDA0002399546090000112
wherein Ω (f)k) A regularization term for each tree. And carrying out Taylor second-order expansion on the error function, calculating the prediction fraction when the derivative of each leaf node is 0, and substituting the prediction fraction into the objective function to obtain the minimum loss.
When the user enters the recommendation page, the theme with the highest user probability and the theme recommendation city are displayed on the corresponding city recommendation module. The city arrangement order is consistent with the offline calculation order. When the user is not interested in the current recommended subject, the TOP right corner 'change one' function button is clicked, and the front end displays the TOP2 subject label corresponding to the user and the corresponding city. The step is circulated until all 5 topics are presented once, and the circulation is continued from the topic with the highest probability.
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
The invention aims to provide a user-based tourist city pushing method which can obtain the booking probability of a user by modeling a city entering a recommendation list according to user, city and time information, and sequence and recommend the city according to the booking probability.
Fig. 2 is a schematic block diagram of the user-based travel city push system of the present invention. As shown in fig. 2, an embodiment of the present invention further provides a user-based tourist city push system, which is used for implementing the user-based tourist city push method, where the user-based tourist city push system 9 includes:
the frequency acquisition module 91 counts the respective frequency data R of each city and the preset n-type theme labelsl,l∈[1,n]And n is a natural number greater than 1.
And the frequency statistics module 92 is used for obtaining TF-IDF values of different subjects of each city according to the frequency data of each city and the n types of subject labels.
And the vector establishing module 93 arranges each city according to the TF-IDF values of the n types of subjects and the same subject sequence to obtain an n-dimensional vector.
And a user vector module 94 for calculating a weighted average of the user and the city vector according to the order of the historical tourist city of each user as the user vector.
The month vector module 95 aggregates the tags of the order placing cities of each month to obtain frequency data corresponding to the topic of each month, and calculates vector data TF-IDF of the monthm,m∈[1,12]And m is a natural number.
The training dimension module 96 calculates similarity data between users and months, between users and cities, and between months and cities, respectively, as training dimensions.
And the random sampling module 97 randomly samples from historical order data and matches corresponding vector and similarity data according to the user, month and city of each order.
The user prediction module 98 predicts the city subscribed by the user.
The user recommending module 99 recommends, to the user, at least one city with the highest predicted user subscription probability according to the vector data of the user.
The user-based tourist city pushing system can obtain the booking probability of the user by modeling the city entering the recommendation list according to the user, the city and the time information, and sort and recommend the cities according to the booking probability.
The embodiment of the invention also provides user-based tourist city pushing equipment which comprises a processor. A memory having stored therein executable instructions of the processor. Wherein the processor is configured to perform the steps of the user-based travel city push method via execution of the executable instructions.
As shown above, this embodiment can obtain the subscription probability of the user by modeling according to the user, city, and time information for the city entering the recommendation list, and sort and recommend the cities according to the subscription probability.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" platform.
Fig. 3 is a schematic structural diagram of the user-based travel city push apparatus of the present invention. An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 3. The electronic device 600 shown in fig. 3 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 3, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one memory unit 620, a bus 630 connecting the different platform components (including the memory unit 620 and the processing unit 610), a display unit 640, etc.
Wherein the storage unit stores program code executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present invention described in the above-mentioned electronic prescription flow processing method section of the present specification. For example, processing unit 610 may perform the steps as shown in fig. 1.
The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.
The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage platforms, to name a few.
The embodiment of the invention also provides a computer-readable storage medium for storing a program, and the program realizes the steps of the user-based tourist city pushing method when being executed. In some possible embodiments, the aspects of the present invention may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present invention described in the above-mentioned electronic prescription flow processing method section of this specification, when the program product is run on the terminal device.
As shown above, this embodiment can obtain the subscription probability of the user by modeling according to the user, city, and time information for the city entering the recommendation list, and sort and recommend the cities according to the subscription probability.
Fig. 4 is a schematic structural diagram of a computer-readable storage medium of the present invention. Referring to fig. 4, a program product 800 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In summary, the present invention is directed to a user-based travel city pushing method, system, device and storage medium, which can obtain a booking probability of a user by modeling a city entering a recommendation list according to user, city and time information, and rank and recommend the city according to the booking probability.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (10)

1. A user-based tourist city pushing method is characterized by comprising the following steps:
s101, counting respective frequency data R of each city and preset n types of subject labelsl,l∈[1,n]N is a natural number greater than 1;
s102, obtaining TF-IDF values of different themes of each city according to frequency data of each city and the n types of theme labels;
s103, arranging each city according to the TF-IDF values of the n types of themes and the same theme sequence to obtain an n-dimensional vector;
s104, obtaining a weighted average of the user and the city vector according to the order of the historical tourist city of each user, and taking the weighted average as a user vector;
s105, aggregating the labels of the order-placing cities of each month to obtain frequency data corresponding to the theme of each month, and calculating the vector data TF-IDF of the monthm,m∈[1,12]M is a natural number;
s106, respectively calculating similarity data between users and months, between users and cities and between months and cities as training dimensions;
s107, randomly sampling from historical order data, and matching corresponding vector and similarity data according to the user, month and city ordered of each order;
s108, predicting the reserved city of the user;
s109, recommending at least one city with the highest predicted user booking probability to the user according to the vector data of the user.
2. The user-based tourist city push method according to claim 1, wherein said step S102 of obtaining TF-IDF value comprises:
s1021, setting the number of cities of each type of theme as Sl,l∈[1,n]Counting each SlA value of (d);
s1022, setting the total number of cities as S, and obtaining the IDF of each type of subjectlValue, IDFl=log[S÷(Sl+1)],l∈[1,n];
S1023, obtaining TF-IDF of each type of subjectlValue, TF-IDFl=Rl×IDFl,l∈[1,n]。
3. The user-based tourist city pushing method according to claim 1, wherein said step S104 comprises:
distinguishing orders of the user in the past three months, the orders in the past three months and other orders, setting the corresponding weight of the orders in the past three months as 1, the corresponding weight of the orders in the past three months as 0.8 and the corresponding weight of the other orders as 0.5, and then calculating a weighted average for the order city vector.
4. The user-based tourist city pushing method according to claim 1, wherein said step S106 comprises:
adopting a cosine similarity calculation method, setting a user as a and a month as b:
Figure FDA0002399546080000021
wherein, XiAnd YiCorresponding to the ith values of the two vectors being computed.
5. The method as claimed in claim 1, wherein the step S107 further comprises creating vector and similarity data by adding at least one of historical order price, city star rating, user price sensitivity, city sales, city ranking, and total number of times the user browses the city.
6. The user-based tourist city pushing method according to claim 1, wherein said step S108 comprises:
s1081, excluding cities reserved by the user in the previous 3 months;
s1082, marking the ordering city corresponding to the order as 1 as a positive sample, marking the rest cities as 0 as negative samples, and keeping the proportion of the positive samples to the negative samples as 1: 20;
s1083, training and testing are conducted through the XGBOOST algorithm, and the booking probability score of each user for each city is obtained.
7. The user-based tourist city pushing method according to claim 1, wherein said step S109 comprises: and selecting P labels with the highest user probability, wherein P is a natural number, and recommending the city which accords with the P labels and has the highest user subscription probability according to the P labels.
8. A user-based tourist city push system for implementing the user-based tourist city push method according to any one of claims 1 to 7, comprising:
the frequency acquisition module is used for counting respective frequency data R of each city and preset n types of subject labelsl,l∈[1,n]N is a natural number greater than 1;
the frequency statistics module is used for obtaining TF-IDF values of different themes of each city according to the frequency data of each city and the n types of theme labels;
the vector establishing module is used for arranging each city according to the TF-IDF values of the n types of themes and the same theme sequence to obtain an n-dimensional vector;
the user vector module is used for obtaining a weighted average of the user and the city vector according to the order of the historical tourist city of each user and taking the weighted average as a user vector;
the month vector module carries out s aggregation on the labels of the order placing cities of each month to obtain frequency data corresponding to the theme of each month, and calculates the vector data TF-IDF of the monthm,m∈[1,12]M is a natural number;
the training dimension module is used for calculating similarity data between users and months, between users and cities and between months and cities respectively to serve as training dimensions;
the random sampling module is used for randomly sampling from historical order data and matching corresponding vector and similarity data according to the user, month and order city of each order;
the user prediction module predicts the booking city of the user;
and the user recommending module is used for recommending at least one city with the highest predicted user booking probability to the user according to the vector data of the user.
9. A user-based travel city push device, comprising:
a processor;
a memory having stored therein executable instructions of the processor;
wherein the processor is configured to perform the steps of the user-based travel city push method of any one of claims 1-7 via execution of the executable instructions.
10. A computer-readable storage medium storing a program, wherein the program when executed implements the steps of the user-based travel city push method of any one of claims 1 to 7.
CN202010142415.8A 2020-03-04 2020-03-04 User-based travel city pushing method, system, equipment and storage medium Active CN111339423B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010142415.8A CN111339423B (en) 2020-03-04 2020-03-04 User-based travel city pushing method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010142415.8A CN111339423B (en) 2020-03-04 2020-03-04 User-based travel city pushing method, system, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111339423A true CN111339423A (en) 2020-06-26
CN111339423B CN111339423B (en) 2023-05-02

Family

ID=71185828

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010142415.8A Active CN111339423B (en) 2020-03-04 2020-03-04 User-based travel city pushing method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111339423B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115329903A (en) * 2022-10-12 2022-11-11 江苏航运职业技术学院 Spatial data integration method and system applied to digital twin city

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002010984A2 (en) * 2000-07-21 2002-02-07 Triplehop Technologies, Inc. System and method for obtaining user preferences and providing user recommendations for unseen physical and information goods and services
US20110302124A1 (en) * 2010-06-08 2011-12-08 Microsoft Corporation Mining Topic-Related Aspects From User Generated Content
WO2013041517A1 (en) * 2011-09-22 2013-03-28 Telefonica, S.A. A method to generate a personalized tourist route
CN107622377A (en) * 2017-09-07 2018-01-23 携程旅游信息技术(上海)有限公司 Based reminding method, system, equipment and the storage medium for sequence information of travelling
US20180232751A1 (en) * 2017-02-15 2018-08-16 Randrr Llc Internet system and method with predictive modeling

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002010984A2 (en) * 2000-07-21 2002-02-07 Triplehop Technologies, Inc. System and method for obtaining user preferences and providing user recommendations for unseen physical and information goods and services
US20110302124A1 (en) * 2010-06-08 2011-12-08 Microsoft Corporation Mining Topic-Related Aspects From User Generated Content
WO2013041517A1 (en) * 2011-09-22 2013-03-28 Telefonica, S.A. A method to generate a personalized tourist route
US20180232751A1 (en) * 2017-02-15 2018-08-16 Randrr Llc Internet system and method with predictive modeling
CN107622377A (en) * 2017-09-07 2018-01-23 携程旅游信息技术(上海)有限公司 Based reminding method, system, equipment and the storage medium for sequence information of travelling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈烨天;米传民;肖琳;: "基于社交信任和标签偏好的景点推荐方法" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115329903A (en) * 2022-10-12 2022-11-11 江苏航运职业技术学院 Spatial data integration method and system applied to digital twin city
CN115329903B (en) * 2022-10-12 2023-05-30 福建美舫时代科技有限公司 Spatial data integration method and system applied to digital twin city

Also Published As

Publication number Publication date
CN111339423B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN106649818B (en) Application search intention identification method and device, application search method and server
CN109063163B (en) Music recommendation method, device, terminal equipment and medium
CN108829822B (en) Media content recommendation method and device, storage medium and electronic device
CN107944986B (en) Method, system and equipment for recommending O2O commodities
CN111061946B (en) Method, device, electronic equipment and storage medium for recommending scenerized content
CN106709040B (en) Application search method and server
EP3579125A1 (en) System, computer-implemented method and computer program product for information retrieval
CN102208088A (en) Server apparatus, client apparatus, content recommendation method, and program
CN107609185B (en) Method, device, equipment and computer-readable storage medium for similarity calculation of POI
CN112214670A (en) Online course recommendation method and device, electronic equipment and storage medium
CN109388743B (en) Language model determining method and device
CN110110233B (en) Information processing method, device, medium and computing equipment
CN110413888B (en) Book recommendation method and device
CN110597978B (en) Article abstract generation method, system, electronic equipment and readable storage medium
CN111429161B (en) Feature extraction method, feature extraction device, storage medium and electronic equipment
CN111754278A (en) Article recommendation method and device, computer storage medium and electronic equipment
CN110532469A (en) A kind of information recommendation method, device, equipment and storage medium
CN114443847A (en) Text classification method, text processing method, text classification device, text processing device, computer equipment and storage medium
CN115659008A (en) Information pushing system and method for big data information feedback, electronic device and medium
CN110110218A (en) A kind of Identity Association method and terminal
CN109522275B (en) Label mining method based on user production content, electronic device and storage medium
CN111339423B (en) User-based travel city pushing method, system, equipment and storage medium
CN111680213A (en) Information recommendation method, data processing method and device
CN111797258B (en) Image pushing method, system, equipment and storage medium based on aesthetic evaluation
US20230351473A1 (en) Apparatus and method for providing user's interior style analysis model on basis of sns text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant