CN111339423A - User-based travel city pushing method, system, equipment and storage medium - Google Patents
User-based travel city pushing method, system, equipment and storage medium Download PDFInfo
- Publication number
- CN111339423A CN111339423A CN202010142415.8A CN202010142415A CN111339423A CN 111339423 A CN111339423 A CN 111339423A CN 202010142415 A CN202010142415 A CN 202010142415A CN 111339423 A CN111339423 A CN 111339423A
- Authority
- CN
- China
- Prior art keywords
- city
- user
- vector
- data
- cities
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 239000013598 vector Substances 0.000 claims abstract description 77
- 238000004364 calculation method Methods 0.000 claims abstract description 10
- 238000005070 sampling Methods 0.000 claims abstract description 10
- 230000004931 aggregating effect Effects 0.000 claims abstract description 6
- 238000012549 training Methods 0.000 claims description 22
- 238000004422 calculation algorithm Methods 0.000 claims description 8
- 230000035945 sensitivity Effects 0.000 claims description 5
- 238000012360 testing method Methods 0.000 claims description 5
- 238000004220 aggregation Methods 0.000 claims 1
- 230000002776 aggregation Effects 0.000 claims 1
- 238000012163 sequencing technique Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 102100024607 DNA topoisomerase 1 Human genes 0.000 description 3
- 102100033587 DNA topoisomerase 2-alpha Human genes 0.000 description 3
- 101000830681 Homo sapiens DNA topoisomerase 1 Proteins 0.000 description 3
- 101000801505 Homo sapiens DNA topoisomerase 2-alpha Proteins 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000005111 flow chemistry technique Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 229910052760 oxygen Inorganic materials 0.000 description 2
- 239000001301 oxygen Substances 0.000 description 2
- 238000010422 painting Methods 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003340 mental effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9537—Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/14—Travel agencies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/55—Push-based network services
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Data Mining & Analysis (AREA)
- Development Economics (AREA)
- Tourism & Hospitality (AREA)
- Finance (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- General Engineering & Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Game Theory and Decision Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a user-based tourist city pushing method, a system, equipment and a storage medium, wherein the method comprises the following steps: counting respective frequency data R of each city and preset n types of subject labelslObtaining TF-IDF values of different themes of each city according to the frequency data of each city and the n-type theme labels to obtain n-dimensional vectors; obtaining the weighted average of the user and city vector calculation according to the user order, aggregating the labels of the city placing the order in each month to obtain frequency data, and calculating the vector data TF-IDF of the monthmFor separate calculationSimilarity data between users and months, between users and cities, and between months and cities; randomly sampling and matching corresponding vectors and similarity data from historical order data; and predicting the reserved city of the user, and recommending the predicted city to the user. The invention can carry out modeling according to the information of the user, the city and the time, and carry out sequencing and recommendation on the city according to the booking probability.
Description
Technical Field
The invention relates to the field of travel information pushing, in particular to a user-based travel city pushing method, system, equipment and storage medium.
Background
With the increasing living standard of people, domestic or overseas traveling becomes an option for more and more users to be leisure. But travel options for leisure purposes tend to go through a longer decision period. For users, on one hand, the cities which can be selected are numerous, and if the details of each city are known, the time is usually much spent; on the other hand, different users may already have relatively fixed selection preferences, but suffer from the inability to quickly find similar travel destinations. If the OTA can help the user to quickly find the destination of the mental apparatus and match with a proper recommendation reason and hotel display, the labor intensity of the user can be greatly reduced, the decision of the user is promoted, and the booking experience of the user is improved.
OTA has already had much exploration in destination recommendations in order to help users find travel destinations faster. At present, the mainstream modes are ranking recommendation based on search popularity, type recommendation based on fixed tags and city recommendation based on recent clicks of users. These recommendations, while relatively simple to operate, have some drawbacks: the popularity recommendation is to sort the cities according to search results of a history for a period of time, but statistical results based on a large number of users are stable, and popular and similar contents may be displayed all the time. And the recommendation based on the labels is to show the same subject to the user by combining the cities with the same labels, but because the subjects are many, the user may have difficulty in finding the content needed by the user at the first time. Based on the recommendation recently clicked by the user, the city recently searched and browsed by the user is set to generate a corresponding search list, and although the mode ensures that the city is the destination in which the user is interested to a great extent, the mode only plays a role in recording historical browsing to a greater extent, and cannot push more attractive travel products for the user, so that the order conversion rate of the travel products is difficult to improve.
Therefore, the invention provides a user-based tourist city pushing method, system, device and storage medium.
Disclosure of Invention
The invention aims to provide a user-based tourist city pushing method, a system, equipment and a storage medium, wherein the user-based tourist city pushing method, the system, the equipment and the storage medium can obtain the booking probability of a user by modeling the city entering a recommendation list according to user, city and time information, and sort and recommend the city according to the booking probability.
The embodiment of the invention provides a user-based tourist city pushing method, which comprises the following steps:
s101, counting respective frequency data R of each city and preset n types of subject labelsl,l∈[1,n]N is a natural number greater than 1;
s102, obtaining TF-IDF values of different themes of each city according to frequency data of each city and the n types of theme labels;
s103, arranging each city according to the TF-IDF values of the n types of themes and the same theme sequence to obtain an n-dimensional vector;
s104, obtaining a weighted average of the user and the city vector according to the order of the historical tourist city of each user, and taking the weighted average as a user vector;
s105, aggregating the labels of the order-placing cities of each month to obtain frequency data corresponding to the theme of each month, and calculating the vector data TF-IDF of the monthm,m∈[1,12]M is a natural number;
s106, respectively calculating similarity data between users and months, between users and cities and between months and cities as training dimensions;
s107, randomly sampling from historical order data, and matching corresponding vector and similarity data according to the user, month and city ordered of each order;
s108, predicting the reserved city of the user;
s109, recommending at least one city with the highest predicted user booking probability to the user according to the vector data of the user.
Preferably, the theme tags include 18 tags in total for charm, landscape such as painting, shopping paradise, artistic interest, food holy, green oxygen, SPA, seashore scene, romantic feeling, water exploration, historical relics, outdoor sports, desert landscape, city landscape, ice and snow sky, time of day, theme park, and travel with friends.
Preferably, the process of obtaining the TF-IDF value in step S102 includes:
s1021, setting the number of cities of each type of theme as Sl,l∈[1,n]Counting each SlA value of (d);
s1022, setting the total number of cities as S, and obtaining the IDF of each type of subjectlValue, IDFl=log[S÷(Sl+1)],l∈[1,n];
S1023, obtaining TF-IDF of each type of subjectlValue, TF-IDFl=Rl×IDFl,l∈[1,n]。
Preferably, the step S104 includes:
distinguishing orders of the user in the past three months, the orders in the past three months and other orders, setting the corresponding weight of the orders in the past three months as 1, the corresponding weight of the orders in the past three months as 0.8 and the corresponding weight of the other orders as 0.5, and then calculating a weighted average for the order city vector.
Preferably, the step S106 includes:
adopting a cosine similarity calculation method, setting a user as a and a month as b:
wherein, XiAnd YiCorresponding to the ith values of the two vectors being computed.
Preferably, the step S107 further includes creating vector and similarity data by adding at least one of historical order price, city star rating, user price sensitivity, city sales, city ranking, and total times of browsing the city by the user.
Preferably, the step S108 includes:
s1081, excluding cities reserved by the user in the previous 3 months;
s1082, marking the ordering city corresponding to the order as 1 as a positive sample, marking the rest cities as 0 as negative samples, and keeping the proportion of the positive samples to the negative samples as 1: 20;
s1083, training and testing are conducted through the XGBOOST algorithm, and the booking probability score of each user for each city is obtained.
Preferably, step S109 includes: and selecting P labels with the highest user probability, wherein P is a natural number, and recommending the city which accords with the P labels and has the highest user subscription probability according to the P labels.
The embodiment of the present invention further provides a user-based tourist city push system, which is used for implementing the user-based tourist city push method, and the user-based tourist city push system includes:
the frequency acquisition module is used for counting respective frequency data R of each city and preset n types of subject labelsl,l∈[1,n]N is a natural number greater than 1;
the frequency statistics module is used for obtaining TF-IDF values of different themes of each city according to the frequency data of each city and the n types of theme labels;
the vector establishing module is used for arranging each city according to the TF-IDF values of the n types of themes and the same theme sequence to obtain an n-dimensional vector;
the user vector module is used for obtaining a weighted average of the user and the city vector according to the order of the historical tourist city of each user and taking the weighted average as a user vector;
the month vector module is used for aggregating the labels of the order-placing cities of each month to obtain frequency data corresponding to the theme of each month, and calculating the vector data TF-IDF of the monthm,m∈[1,12]M is a natural number;
the training dimension module is used for calculating similarity data between users and months, between users and cities and between months and cities respectively to serve as training dimensions;
the random sampling module is used for randomly sampling from historical order data and matching corresponding vector and similarity data according to the user, month and order city of each order;
the user prediction module predicts the booking city of the user;
and the user recommending module is used for recommending at least one city with the highest predicted user booking probability to the user according to the vector data of the user.
The embodiment of the invention also provides a user-based tourist city pushing device, which comprises:
a processor;
a memory having stored therein executable instructions of the processor;
wherein the processor is configured to perform the steps of the user-based travel city push method described above via execution of the executable instructions.
Embodiments of the present invention also provide a computer-readable storage medium storing a program that, when executed, performs the steps of the user-based travel city push method described above.
The invention aims to provide a user-based tourist city pushing method, a user-based tourist city pushing system, a user-based tourist city pushing device and a user-based tourist city pushing storage medium.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, with reference to the accompanying drawings.
FIG. 1 is a flow chart of a user-based travel city push method of the present invention.
Fig. 2 is a schematic block diagram of the user-based travel city push system of the present invention.
Fig. 3 is a schematic structural diagram of the user-based travel city push apparatus of the present invention. And
fig. 4 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar structures, and thus their repetitive description will be omitted.
FIG. 1 is a flow chart of a user-based travel city push method of the present invention. As shown in fig. 1, an embodiment of the present invention provides a user-based travel city push method, including the following steps:
s101, counting respective frequency data R of each city and preset n types of subject labelsl,l∈[1,n]And n is a natural number greater than 1.
S102, obtaining TF-IDF values of different themes of each city according to the frequency data of each city and the n types of theme labels.
S103, arranging each city according to the TF-IDF values of the n types of subjects and the same subject sequence to obtain an n-dimensional vector.
And S104, obtaining the weighted average of the user and the city vector according to the order of the historical tourist city of each user, and taking the weighted average as the user vector.
S105, aggregating the labels of the order-placing cities of each month to obtain frequency data corresponding to the theme of each month, and calculating the vector data TF-IDF of the monthm,m∈[1,12]And m is a natural number.
S106, calculating similarity data between users and months, between users and cities and between months and cities respectively to serve as training dimensions.
S107, randomly sampling from historical order data, and matching corresponding vector and similarity data according to the user, month and city ordered of each order.
And S108, predicting the reserved city of the user.
S109, recommending at least one city with the highest predicted user booking probability to the user according to the vector data of the user.
In a preferred embodiment, the theme tags include 18 tags in total for charm, landscape such as painting, shopping paradise, artistic interest, food holy, green oxygen, SPA, seashore scene, romantic feeling, water exploration, historical relics, outdoor sports, desert landscape, city landscape, ice and snow sky, parent-child time, theme park, and fellow friends.
In a preferred embodiment, the process of obtaining the TF-IDF value in step S102 includes:
s1021, setting the number of cities of each type of theme as Sl,l∈[1,n]Counting each SlThe value of (c).
S1022, setting the total number of cities as S, and obtaining the IDF of each type of subjectlValue, IDFl=log[S÷(Sl+1)],l∈[1,n]。
S1023, obtaining TF-IDF of each type of subjectlValue, TF-IDFl=Rl×IDFl,l∈[1,n]。
In a preferred embodiment, step S104 includes:
distinguishing orders of the user in the past three months, the orders in the past three months and other orders, setting the corresponding weight of the orders in the past three months as 1, the corresponding weight of the orders in the past three months as 0.8 and the corresponding weight of the other orders as 0.5, and then calculating a weighted average for the order city vector.
In a preferred embodiment, step S106 includes:
adopting a cosine similarity calculation method, setting a user as a and a month as b:
wherein, XiAnd YiCorresponding to the ith values of the two vectors being computed.
In a preferred embodiment, step S107 further comprises creating vector and similarity data by adding at least one of historical order price, city star rating, user price sensitivity, city sales, city ranking, and total number of times the user browses the city.
In a preferred embodiment, step S108 includes:
s1081, excluding cities that the user has subscribed to in the previous 3 months.
S1082, marking the ordering city corresponding to the order as 1 as a positive sample, marking the rest cities as 0 as negative samples, and keeping the proportion of the positive samples to the negative samples as 1: 20.
s1083, training and testing are conducted through the XGBOOST algorithm, and the booking probability score of each user for each city is obtained.
In a preferred embodiment, step S109 includes: and selecting P labels with the highest user probability, wherein P is a natural number, and recommending the city which accords with the P labels and has the highest user subscription probability.
The invention aims to provide a city recommendation system based on user historical behavior information, and a user can conveniently obtain city recommendation recommendations similar to theme preferences of the user through the system. And modeling the cities entering the recommendation list according to the user, the cities and the time information to obtain the booking probability of the user, and sequencing and recommending the cities according to the booking probability.
The invention is realized by the following technical scheme:
the invention comprises the following steps: the system comprises a city information module, a user information module, a time information module, a cross data module, a user ordering probability prediction module and a front-end display module. Wherein: the user information module is used for storing user data and ordering information. The city information module is used for storing each city vector and related data. The front-end display module is used for displaying the results calculated by other modules at the front end.
The city information module is used for storing city related information, such as city label vectors, sales volume ranking and the like, wherein the city vector information is calculated by TF-IDF.
In particular, TF-IDF is a frequency-based statistical method for evaluating the importance of a word to a document in the entire corpus. The importance of a word in its core algorithm increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus. By using the calculation method, the labels meeting higher occurrence frequency and having discrimination under each city can be found. The TF-IDF value of the label is expressed by decimal number, and different label values are arranged in sequence to form a vector corresponding to the city.
The user information module stores historical relevant information of the user, such as a label vector of the user, historical ordering price, star level, price sensitivity and the like.
The time information module stores time related information such as a label vector corresponding to the current date, a hot sale city and the like.
The cross data module is used for storing multi-dimensional cross information, such as user-time similarity, user-city similarity, the ratio of the city to the station in the closest point of the user, and the like. Wherein the similarity calculation adopts cosine similarity.
The user ordering probability prediction module acquires relevant data stored by each information module, models the ordering probability of the user to the city, and calls the XGBOOST model to predict whether the user orders in the city. And filtering to a certain extent according to the city label that each city accords with, and storing the result.
When a user enters the recommendation page, the front section display module calls the stored recommendation city result, and displays the recommendation city result on the App page according to the sequence by matching with the corresponding pictures and the recommendation reason.
The method and the system can meet the preferences of the user on different themes, provide some personalized city choices for the user, and can better improve the browsing and ordering experience of the user.
The method comprises a background calculation part and a front-end display part. Wherein the background computing part comprises 5 steps in total: 1) and calculating a city corresponding vector according to the label information of the city. 2) And obtaining the vectors of the months and the users, and calculating the similarity between every two of the cities, the users and the months. 3) And acquiring other related dimensions, and training a user city ordering probability prediction model. 4) And obtaining a recommended city list by using a probability model and sorting the recommended cities from high to low according to probability. 5) And filtering the cities which do not meet the requirements according to the high-probability labels of the users, and selecting the cities corresponding to the top 10. 6) When the user enters the city recommendation page, the front end calls the recommendation result and displays the recommendation result in cooperation with the corresponding picture and the recommendation reason.
The main steps of the invention are described below according to a flow scheme:
and according to the existing user tags and mapping relations, counting the frequency data of 18 major subjects in each city. The topics participating in the statistics are as follows:
after frequency data of different subjects of each city are obtained, an IDF value is calculated according to a formula, and then TF-IDF is calculated. The calculation formula is as follows:
TF-IDF (frequency of the subject) IDF
Each city is composed of TF-IDF values of different topics in order into an 18-dimensional vector.
Because the destinations of interest to the user may be reflected by their historical preferences and recent browsing, and may also be affected by time-season factors, vector data for the user and time may be constructed from city data.
And (3) user vector: and marking historical orders of the user, and distinguishing historical contemporaneous (three months) orders of the user, wherein the orders in the past three months are other orders. Respectively setting corresponding weights (1, 0.8 and 0.5), and then calculating a weighted average for the order city vector as a user vector
Time vector: and aggregating the labels of the city for placing the order in each month to obtain frequency data corresponding to the theme of each month, and calculating TF-IDF as vector data of the month.
And then respectively calculating similarity data among user-time, user-city and time-city as training dimensionality. The similarity adopts a cosine similarity calculation method, and the specific formula is as follows:
wherein XiAnd YiCorresponding to the ith values of the two vectors being computed.
Random sampling was performed from the order data of the past year, and about 100 ten thousand of data were selected, 80% of which were training data and the rest were test sets, ensuring that the order time of the test set was after the training set. And matching corresponding vector and similarity data according to the user, time and ordering city of each order. While some other relevant dimensions are constructed as an information supplement. Such as historical order price, star rating, price sensitivity, city sales, ranking, user browsing the city, etc.
And predicting the booking city of the user by using the relevant dimension constructed in the step. The total number of cities entering the recommended pool is 200, and the cities are TOP100 cities for domestic and overseas hot sales. Excluding cities that the user has subscribed to in the 3 months prior to the order time. And marking the ordering city corresponding to the order as 1, namely a positive sample, and marking the rest cities as 0, namely a negative sample. Because the proportion difference of the positive sample and the negative sample is overlarge and the data size required by training is large, the negative sample city of each order is sampled according to the city order probability, namely the higher the negative sample sales volume is, the higher the probability of being sampled is. And finally, ensuring that the positive-negative ratio of the training sample is about 1: 20. using binary in XGBOOST: the logistic method is trained and tested to obtain the booking probability score of each user for each city, wherein the higher the score, the higher the probability that the city is booked. The XGboost full-name (eXtremeGradient Boosting) extreme gradient is promoted, and the XGboost extreme gradient is often used in some competitions, so that the effect is obvious. The tool is a tool of a massively parallel booted tree, and is the fastest and best tool kit of the open source booted tree at present. The algorithm applied by the XGBoost is an improvement of the gbdt (binary boosting discislon tree), and can be used for classification and regression.
The TOP5 label with the highest probability of the user is selected based on the user's vector data, and only those labels are recommended to meet and the user has a higher probability of booking a city.A value of K is set as the threshold for whether the city meets the label, and if and only if the city corresponds to a label vector value greater than K, the city is considered to enter the recommended candidate pool for the user's subject label.A value of K is obtained by calculating the number of TF-IDF value scores under different labels.A TOP10 city with the highest probability of meeting the condition under the user TOP1 label is first calculated as the recommended city to which the TOP1 label corresponds.A TOP2 label is then calculated since the same city may meet both the TOP1 label and the TOP 23 label and with higher probabilities, then if each label is ranked by probability, the recommended list under different cities may have a large degree of repetition.A decay coefficient of decay α is experimentally determined.A probability value of the highest probability under the TOP2 label is obtained based on the probability value of the decay 735, and then the probability of the recommended city 8296 is stored for each city.
Among them, the Xgboost algorithm is an ensemble learning algorithm widely used in the industry. Here to help predict the predetermined probability score for each candidate set. In the algorithm process, a tree is grown by continuously performing feature splitting, and the residual error between the predicted value and the actual value of the model in the previous round is fitted in each round of learning. When the training is finished, adding the scores of all the obtained tree models to obtain a final prediction score:
where K represents the total number of trees learned, fk(xi) Is an expression per tree, which is actually based on training set samples xiAs a function of (c). Predicting an objectiveThe orders for the training set are labeled.
The training purpose of each tree is to minimize the loss function. The loss function is:
wherein Ω (f)k) A regularization term for each tree. And carrying out Taylor second-order expansion on the error function, calculating the prediction fraction when the derivative of each leaf node is 0, and substituting the prediction fraction into the objective function to obtain the minimum loss.
When the user enters the recommendation page, the theme with the highest user probability and the theme recommendation city are displayed on the corresponding city recommendation module. The city arrangement order is consistent with the offline calculation order. When the user is not interested in the current recommended subject, the TOP right corner 'change one' function button is clicked, and the front end displays the TOP2 subject label corresponding to the user and the corresponding city. The step is circulated until all 5 topics are presented once, and the circulation is continued from the topic with the highest probability.
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
The invention aims to provide a user-based tourist city pushing method which can obtain the booking probability of a user by modeling a city entering a recommendation list according to user, city and time information, and sequence and recommend the city according to the booking probability.
Fig. 2 is a schematic block diagram of the user-based travel city push system of the present invention. As shown in fig. 2, an embodiment of the present invention further provides a user-based tourist city push system, which is used for implementing the user-based tourist city push method, where the user-based tourist city push system 9 includes:
the frequency acquisition module 91 counts the respective frequency data R of each city and the preset n-type theme labelsl,l∈[1,n]And n is a natural number greater than 1.
And the frequency statistics module 92 is used for obtaining TF-IDF values of different subjects of each city according to the frequency data of each city and the n types of subject labels.
And the vector establishing module 93 arranges each city according to the TF-IDF values of the n types of subjects and the same subject sequence to obtain an n-dimensional vector.
And a user vector module 94 for calculating a weighted average of the user and the city vector according to the order of the historical tourist city of each user as the user vector.
The month vector module 95 aggregates the tags of the order placing cities of each month to obtain frequency data corresponding to the topic of each month, and calculates vector data TF-IDF of the monthm,m∈[1,12]And m is a natural number.
The training dimension module 96 calculates similarity data between users and months, between users and cities, and between months and cities, respectively, as training dimensions.
And the random sampling module 97 randomly samples from historical order data and matches corresponding vector and similarity data according to the user, month and city of each order.
The user prediction module 98 predicts the city subscribed by the user.
The user recommending module 99 recommends, to the user, at least one city with the highest predicted user subscription probability according to the vector data of the user.
The user-based tourist city pushing system can obtain the booking probability of the user by modeling the city entering the recommendation list according to the user, the city and the time information, and sort and recommend the cities according to the booking probability.
The embodiment of the invention also provides user-based tourist city pushing equipment which comprises a processor. A memory having stored therein executable instructions of the processor. Wherein the processor is configured to perform the steps of the user-based travel city push method via execution of the executable instructions.
As shown above, this embodiment can obtain the subscription probability of the user by modeling according to the user, city, and time information for the city entering the recommendation list, and sort and recommend the cities according to the subscription probability.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" platform.
Fig. 3 is a schematic structural diagram of the user-based travel city push apparatus of the present invention. An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 3. The electronic device 600 shown in fig. 3 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 3, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one memory unit 620, a bus 630 connecting the different platform components (including the memory unit 620 and the processing unit 610), a display unit 640, etc.
Wherein the storage unit stores program code executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present invention described in the above-mentioned electronic prescription flow processing method section of the present specification. For example, processing unit 610 may perform the steps as shown in fig. 1.
The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.
The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage platforms, to name a few.
The embodiment of the invention also provides a computer-readable storage medium for storing a program, and the program realizes the steps of the user-based tourist city pushing method when being executed. In some possible embodiments, the aspects of the present invention may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present invention described in the above-mentioned electronic prescription flow processing method section of this specification, when the program product is run on the terminal device.
As shown above, this embodiment can obtain the subscription probability of the user by modeling according to the user, city, and time information for the city entering the recommendation list, and sort and recommend the cities according to the subscription probability.
Fig. 4 is a schematic structural diagram of a computer-readable storage medium of the present invention. Referring to fig. 4, a program product 800 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In summary, the present invention is directed to a user-based travel city pushing method, system, device and storage medium, which can obtain a booking probability of a user by modeling a city entering a recommendation list according to user, city and time information, and rank and recommend the city according to the booking probability.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.
Claims (10)
1. A user-based tourist city pushing method is characterized by comprising the following steps:
s101, counting respective frequency data R of each city and preset n types of subject labelsl,l∈[1,n]N is a natural number greater than 1;
s102, obtaining TF-IDF values of different themes of each city according to frequency data of each city and the n types of theme labels;
s103, arranging each city according to the TF-IDF values of the n types of themes and the same theme sequence to obtain an n-dimensional vector;
s104, obtaining a weighted average of the user and the city vector according to the order of the historical tourist city of each user, and taking the weighted average as a user vector;
s105, aggregating the labels of the order-placing cities of each month to obtain frequency data corresponding to the theme of each month, and calculating the vector data TF-IDF of the monthm,m∈[1,12]M is a natural number;
s106, respectively calculating similarity data between users and months, between users and cities and between months and cities as training dimensions;
s107, randomly sampling from historical order data, and matching corresponding vector and similarity data according to the user, month and city ordered of each order;
s108, predicting the reserved city of the user;
s109, recommending at least one city with the highest predicted user booking probability to the user according to the vector data of the user.
2. The user-based tourist city push method according to claim 1, wherein said step S102 of obtaining TF-IDF value comprises:
s1021, setting the number of cities of each type of theme as Sl,l∈[1,n]Counting each SlA value of (d);
s1022, setting the total number of cities as S, and obtaining the IDF of each type of subjectlValue, IDFl=log[S÷(Sl+1)],l∈[1,n];
S1023, obtaining TF-IDF of each type of subjectlValue, TF-IDFl=Rl×IDFl,l∈[1,n]。
3. The user-based tourist city pushing method according to claim 1, wherein said step S104 comprises:
distinguishing orders of the user in the past three months, the orders in the past three months and other orders, setting the corresponding weight of the orders in the past three months as 1, the corresponding weight of the orders in the past three months as 0.8 and the corresponding weight of the other orders as 0.5, and then calculating a weighted average for the order city vector.
5. The method as claimed in claim 1, wherein the step S107 further comprises creating vector and similarity data by adding at least one of historical order price, city star rating, user price sensitivity, city sales, city ranking, and total number of times the user browses the city.
6. The user-based tourist city pushing method according to claim 1, wherein said step S108 comprises:
s1081, excluding cities reserved by the user in the previous 3 months;
s1082, marking the ordering city corresponding to the order as 1 as a positive sample, marking the rest cities as 0 as negative samples, and keeping the proportion of the positive samples to the negative samples as 1: 20;
s1083, training and testing are conducted through the XGBOOST algorithm, and the booking probability score of each user for each city is obtained.
7. The user-based tourist city pushing method according to claim 1, wherein said step S109 comprises: and selecting P labels with the highest user probability, wherein P is a natural number, and recommending the city which accords with the P labels and has the highest user subscription probability according to the P labels.
8. A user-based tourist city push system for implementing the user-based tourist city push method according to any one of claims 1 to 7, comprising:
the frequency acquisition module is used for counting respective frequency data R of each city and preset n types of subject labelsl,l∈[1,n]N is a natural number greater than 1;
the frequency statistics module is used for obtaining TF-IDF values of different themes of each city according to the frequency data of each city and the n types of theme labels;
the vector establishing module is used for arranging each city according to the TF-IDF values of the n types of themes and the same theme sequence to obtain an n-dimensional vector;
the user vector module is used for obtaining a weighted average of the user and the city vector according to the order of the historical tourist city of each user and taking the weighted average as a user vector;
the month vector module carries out s aggregation on the labels of the order placing cities of each month to obtain frequency data corresponding to the theme of each month, and calculates the vector data TF-IDF of the monthm,m∈[1,12]M is a natural number;
the training dimension module is used for calculating similarity data between users and months, between users and cities and between months and cities respectively to serve as training dimensions;
the random sampling module is used for randomly sampling from historical order data and matching corresponding vector and similarity data according to the user, month and order city of each order;
the user prediction module predicts the booking city of the user;
and the user recommending module is used for recommending at least one city with the highest predicted user booking probability to the user according to the vector data of the user.
9. A user-based travel city push device, comprising:
a processor;
a memory having stored therein executable instructions of the processor;
wherein the processor is configured to perform the steps of the user-based travel city push method of any one of claims 1-7 via execution of the executable instructions.
10. A computer-readable storage medium storing a program, wherein the program when executed implements the steps of the user-based travel city push method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010142415.8A CN111339423B (en) | 2020-03-04 | 2020-03-04 | User-based travel city pushing method, system, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010142415.8A CN111339423B (en) | 2020-03-04 | 2020-03-04 | User-based travel city pushing method, system, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111339423A true CN111339423A (en) | 2020-06-26 |
CN111339423B CN111339423B (en) | 2023-05-02 |
Family
ID=71185828
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010142415.8A Active CN111339423B (en) | 2020-03-04 | 2020-03-04 | User-based travel city pushing method, system, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111339423B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115329903A (en) * | 2022-10-12 | 2022-11-11 | 江苏航运职业技术学院 | Spatial data integration method and system applied to digital twin city |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002010984A2 (en) * | 2000-07-21 | 2002-02-07 | Triplehop Technologies, Inc. | System and method for obtaining user preferences and providing user recommendations for unseen physical and information goods and services |
US20110302124A1 (en) * | 2010-06-08 | 2011-12-08 | Microsoft Corporation | Mining Topic-Related Aspects From User Generated Content |
WO2013041517A1 (en) * | 2011-09-22 | 2013-03-28 | Telefonica, S.A. | A method to generate a personalized tourist route |
CN107622377A (en) * | 2017-09-07 | 2018-01-23 | 携程旅游信息技术(上海)有限公司 | Based reminding method, system, equipment and the storage medium for sequence information of travelling |
US20180232751A1 (en) * | 2017-02-15 | 2018-08-16 | Randrr Llc | Internet system and method with predictive modeling |
-
2020
- 2020-03-04 CN CN202010142415.8A patent/CN111339423B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002010984A2 (en) * | 2000-07-21 | 2002-02-07 | Triplehop Technologies, Inc. | System and method for obtaining user preferences and providing user recommendations for unseen physical and information goods and services |
US20110302124A1 (en) * | 2010-06-08 | 2011-12-08 | Microsoft Corporation | Mining Topic-Related Aspects From User Generated Content |
WO2013041517A1 (en) * | 2011-09-22 | 2013-03-28 | Telefonica, S.A. | A method to generate a personalized tourist route |
US20180232751A1 (en) * | 2017-02-15 | 2018-08-16 | Randrr Llc | Internet system and method with predictive modeling |
CN107622377A (en) * | 2017-09-07 | 2018-01-23 | 携程旅游信息技术(上海)有限公司 | Based reminding method, system, equipment and the storage medium for sequence information of travelling |
Non-Patent Citations (1)
Title |
---|
陈烨天;米传民;肖琳;: "基于社交信任和标签偏好的景点推荐方法" * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115329903A (en) * | 2022-10-12 | 2022-11-11 | 江苏航运职业技术学院 | Spatial data integration method and system applied to digital twin city |
CN115329903B (en) * | 2022-10-12 | 2023-05-30 | 福建美舫时代科技有限公司 | Spatial data integration method and system applied to digital twin city |
Also Published As
Publication number | Publication date |
---|---|
CN111339423B (en) | 2023-05-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106649818B (en) | Application search intention identification method and device, application search method and server | |
CN109063163B (en) | Music recommendation method, device, terminal equipment and medium | |
CN108829822B (en) | Media content recommendation method and device, storage medium and electronic device | |
CN107944986B (en) | Method, system and equipment for recommending O2O commodities | |
CN111061946B (en) | Method, device, electronic equipment and storage medium for recommending scenerized content | |
CN106709040B (en) | Application search method and server | |
EP3579125A1 (en) | System, computer-implemented method and computer program product for information retrieval | |
CN102208088A (en) | Server apparatus, client apparatus, content recommendation method, and program | |
CN107609185B (en) | Method, device, equipment and computer-readable storage medium for similarity calculation of POI | |
CN112214670A (en) | Online course recommendation method and device, electronic equipment and storage medium | |
CN109388743B (en) | Language model determining method and device | |
CN110110233B (en) | Information processing method, device, medium and computing equipment | |
CN110413888B (en) | Book recommendation method and device | |
CN110597978B (en) | Article abstract generation method, system, electronic equipment and readable storage medium | |
CN111429161B (en) | Feature extraction method, feature extraction device, storage medium and electronic equipment | |
CN111754278A (en) | Article recommendation method and device, computer storage medium and electronic equipment | |
CN110532469A (en) | A kind of information recommendation method, device, equipment and storage medium | |
CN114443847A (en) | Text classification method, text processing method, text classification device, text processing device, computer equipment and storage medium | |
CN115659008A (en) | Information pushing system and method for big data information feedback, electronic device and medium | |
CN110110218A (en) | A kind of Identity Association method and terminal | |
CN109522275B (en) | Label mining method based on user production content, electronic device and storage medium | |
CN111339423B (en) | User-based travel city pushing method, system, equipment and storage medium | |
CN111680213A (en) | Information recommendation method, data processing method and device | |
CN111797258B (en) | Image pushing method, system, equipment and storage medium based on aesthetic evaluation | |
US20230351473A1 (en) | Apparatus and method for providing user's interior style analysis model on basis of sns text |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |