CN117076786A

CN117076786A - Cross-province travel hot line recommendation method based on roaming information

Info

Publication number: CN117076786A
Application number: CN202311119914.5A
Authority: CN
Inventors: 陈曦; 潘建忠; 王鹏亮; 胡伟龙
Original assignee: Guangzhou Richstone Technology Co ltd
Current assignee: Guangzhou Richstone Technology Co ltd
Priority date: 2023-08-31
Filing date: 2023-08-31
Publication date: 2023-11-17
Anticipated expiration: 2043-08-31
Also published as: CN117076786B

Abstract

The invention relates to the technical field of mobile internet, in particular to a trans-provincial travel hot line recommending method based on roaming information; the method comprises the following steps: s1, acquiring roaming track information of a user, and storing the roaming track information in a database, wherein the roaming track information comprises number information, position information, time information and track information of the user; s2, constructing a line recommendation algorithm project, which specifically comprises the following steps: s3, collecting user feedback information, wherein the feedback information comprises satisfaction degree and score of the user, and then using the collected user feedback information to optimize and improve a line recommendation algorithm; according to the travel route recommendation method and the travel route recommendation device, travel routes meeting the requirements of the user are automatically recommended according to the travel preference and the historical roaming track of the user.

Description

Cross-province travel hot line recommendation method based on roaming information

Technical Field

The invention relates to the technical field of mobile internet, in particular to a cross-provincial travel hot line recommending method based on roaming information.

Background

Along with the improvement of the living standard of people, more and more people choose to travel to relax body and mind and increase knowledge, the tourism industry gradually occupies very important positions in national economy of China, and in the tourism industry, a good tourism line is designed, so that more tourists can be brought to a tourism agency or other tourism operators, and better economic benefits are brought to the tourist agency or other tourism operators. The tourist route is an important component of tourist products and is an important tie for connecting tourists, tourist enterprises, related departments and tourist destinations.

At present, many travel agencies, OTA and other institutions provide travel route recommendation services in the market, but the traditional travel route recommendation modes are based on information such as geographic positions, travel time, budget and the like of users, are subjective and not objective enough, and meanwhile personalized demands and preferences of the users cannot be fully considered.

In recent years, the popularization and development of mobile internet technology have greatly changed the travel mode of users, users can acquire information at any time and any place through mobile equipment, share travel experience, record own roaming tracks in travel, better understand the travel demands and preferences of users based on the roaming data, and recommend more personalized travel routes for users.

Disclosure of Invention

The invention solves the technical problems in the prior art, and provides a cross-provincial travel hot line recommending method based on roaming information.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

a cross-province travel hot line recommending method based on roaming information comprises the following steps:

s1, acquiring roaming track information of a user, and storing the roaming track information in a database, wherein the roaming track information comprises number information, position information, time information and track information of the user;

s2, constructing a line recommendation algorithm project, which specifically comprises the following steps:

s201, extracting stay points and tracks in the user roaming track information in the data, and analyzing and processing the stay points and track information by using a geographic information system tool to obtain stay time and frequency information of the user at different scenic spots;

s202, analyzing stay points and track information of a user through data, identifying hot tourist attractions and areas in each province, and determining hot degree according to stay time and frequency;

s203, combining the hot spots and the roaming track information of the user, calculating an optimal route by using a graph algorithm, removing cold spots on the optimal route, adding the hot spots within a set distance of the optimal route, and connecting to generate a hot route;

s204, matching the travel destination of the user with scenic spots contained in the hot line to provide a travel line meeting the user requirement;

and S3, collecting user feedback information, wherein the feedback information comprises satisfaction degree and score of the user, and using the collected user feedback information for optimizing and improving a line recommendation algorithm.

Further, in step S202, the method adopts a K-means algorithm to perform cluster analysis on the popularity of the scenic spots, and divides tourists into different clusters according to the residence time and the frequency, and specifically comprises the following steps:

s2021, taking out residence time and frequency data of tourists from the cluster, and normalizing the data;

s2022, determining the number K of clusters by using an Elbow method analysis index;

s2023, carrying out K-means cluster analysis, wherein the expression of the K-means algorithm is as follows:

J＝∑(d(x _i ,c _i )) ²

in the above, x _i Representing the sample point, c _i Representing the center point of the cluster to which the sample belongs, d being a distance metric function;

s2024, calculating the size of the clusters and the distance between the clusters according to the clustering result, and determining the scenic spot hot degree by combining the residence time and the frequency.

Still further, the cluster size is calculated by the following formula:

J _i ＝∑(d(x _i ,c _i )) ²

in the above, x _i Representing the sample point, c _i Representing the center point of the cluster to which the sample belongs, d representing a distance metric function;

the scenic spot popularity is calculated by the following formula:

in the above description, the polarity is the hot degree of the scenic spots, and specifically is:

when the polarity is more than or equal to 0.25, the scene is a hot scene;

when the polarity is less than or equal to 0.1 and less than 0.25, the scene is a common hot scene;

when the polarity is less than 0.1, the scene point is a cold scene point.

Further, S2023 specifically includes the following steps:

the method comprises the steps of firstly, randomly initializing central points of K clusters, calculating the distance between each sample and each central point, and distributing the samples to the cluster where the nearest central point is located;

the distance between each sample and its center point is calculated by:

d(x,y)＝sqrt((x ₁ -x ₂ ) ² +(y ₂ -y ₂ ) ² )

in the above formula, sqrt represents square root sign, x ₁ 、y ₁ Respectively representing the longitude and latitude of the sample, x ₂ 、y ₂ Longitude and latitude respectively representing the center point;

step two, updating the central point of each cluster, and setting the central point as the average value of all samples in the cluster; updating of cluster centers is done by:

in the above, x _i Sample points representing the ith cluster, n _i Representing the number of samples of the ith cluster;

and thirdly, repeating the two steps until a stopping condition is reached, wherein the stopping condition is that samples in the cluster are not changed any more.

Further, the optimal route in step S203 is obtained by the following steps:

s2031, calculating the distances between all adjacent scenic spots in the related area, and simultaneously acquiring the hot degree data of the scenic spots;

s2032, constructing a graph model, wherein each scenic spot is used as a node in the graph, and the distance between the nodes is used as the weight of the edge;

s2033, determining a starting point and an end point and the number of scenic spots required to pass through;

s2034, calculating the best path using Dijkstra algorithm.

Further, the method for calculating the optimal path by Dijkstra algorithm is as follows: starting from the starting point, the values to each node are calculated separately, no route between two nodes is calculated as + -infinity, each time the minimum value in this graph is fixed, the step-by-step forward progress is carried out until a final value is determined, the path generating the minimum value is the optimal path, and the value at the end point is the distance of the optimal path.

Further, S204 specifically includes the following steps:

s2041, extracting roaming track data from a database, wherein the roaming track data comprise the past travel destination (scenic spot), stay time and travel track of a user;

s2042, according to the historical roaming data of the user and the travel hot line generated in the step S203, matching the travel destination of the user with scenic spots contained in the line, and calculating the hot travel line matched with the user;

s2043, generating travel routes of interest to the user, and recommending the proper travel routes for the user by combining the preference of the user and a recommendation algorithm.

Further, S2043 is specifically performed by:

1) Calculating the similarity between users, wherein the similarity is calculated by using cosine similarity, and two users are set to be respectively represented by i and k, and the similarity between the two users is s (i and k);

2) Calculating the score of the travel route of each user, calculating the score of each travel route of each user, and setting the score of the travel route j of the user i as r (i, j);

3) Predicting a travel route score for user k similar to user i, expressed by:

in the above formula, p (k, j) represents the predictive score of the travel route j by the user k, s (i, k) represents the similarity between the user i and the user k, and r (i, j) represents the score of the travel route j by the user i;

4) Recommending the travel route, recommending the travel route with the highest score according to the predictive score of the user, and setting the recommended travel route of the user k as J ^* Then:

J ^* ＝max(p(k,j))

in the above equation, max (p (k, j)) represents the highest predictive score of user k for line j.

Further, the method for analyzing and processing the stay points and the tracks by the geographic information system tool comprises the following steps: the track data in the roaming track data is input into a GIS to generate a Shapefile vector data file, the Shapefile is converted into GeoJson format data through python codes, relevant information is extracted from the GeoJson data, corresponding scenic spots are found through longitude and latitude, and stay time and frequency information are extracted.

Further, S3 specifically includes the following steps:

s301, a notice is issued on a website of the travel platform, and a user is informed of providing feedback information through a specific mailbox, a telephone or an online form;

s202, formulating a questionnaire containing problems related to travel recommended lines, wherein the questionnaire comprises satisfaction, recommended degree, price rationality, scenic spot richness and tour guide explanation level;

s203, sorting and analyzing the collected user feedback data, and finding out the favorites and demands of users on the travel recommended line, and the problems and improvement places;

s204, the analysis result is released through the website of the travel platform in a report form, so that the user can know the advantages and disadvantages and the improvement direction of the travel recommended line.

Compared with the prior art, the invention has the beneficial effects that:

(1) According to the invention, stay points and frequencies of tourists are obtained through roaming track information, a clustering algorithm is combined to obtain hot spots, an optimal line is obtained according to a graph algorithm, a plurality of spots which are positioned at the front in the optimal line are selected, the spots are connected to form the hot line, and matching is carried out according to the destination of a user and the spots contained in the hot line, so that a travel line conforming to the user is provided, and the problem of automatically recommending the travel line conforming to the user requirement according to the travel preference and the historical roaming track of the user is solved.

Drawings

Fig. 1 is a flow chart of the method of the present invention.

Fig. 2 is a schematic diagram of an example of the Dijkstra algorithm of the present invention calculating the shortest path.

Detailed Description

The technical solutions of the present invention will be clearly described below with reference to the accompanying drawings, and it is obvious that the described embodiments are not all embodiments of the present invention, and all other embodiments obtained by a person skilled in the art without making any inventive effort are within the scope of protection of the present invention.

As shown in fig. 1, the invention provides a method for recommending a cross-provincial travel hot line based on roaming information, which comprises the following steps:

s1, collecting roaming data of a user, reporting the roaming data by a mobile device of the user, acquiring roaming track information of the user by purchasing a roaming data packet interface of an operator, wherein the roaming track information comprises number information, position information, time information, track information and the like of the user, a data structure of the roaming track information is shown in a table 1, a data sample of the roaming track information is shown in a table 2 for example, and storing the information into a database after acquiring the roaming track information of the user.

Table 1 data structure of roaming trail information

Field name	Data type	Description of the invention
			id	int	Main key
mobile	varchar	Mobile phone number
			timestamp	datetime	Time stamp
latitude	decimal	Latitude of latitude
			longitude	decimal	Longitude and latitude
location name	varchar	Place name
			stay time	int	Residence time (seconds)
trajectory	geometry	Movement track

Table 2 data sample of roaming trajectory information

s201, extracting roaming track data from the database, and processing geographic information in the roaming track data.

And extracting roaming track data from the database, carrying out geographic information processing on the collected roaming track data, extracting main stay points and tracks of users in each province, and analyzing and processing the main stay points and tracks by using a Geographic Information System (GIS) tool.

Specifically, track data in a roaming track data table is input into a GIS to generate a Shapefile vector data file, the Shapefile is converted into GeoJson format data through python codes, the GeoJson data contains attribute information in the original Shapefile, relevant information is extracted from the GeoJson data, corresponding scenic spots are found through longitude and latitude, stay time and frequency information are extracted, and chart display is output.

S202, identifying hot scenic spots, analyzing stay points and track information of users through GeoJson data, identifying hot tourist scenic spots and areas in each province, and determining the hot degree according to stay time and frequency.

Specifically, the longitude and latitude corresponding to each scenic spot in the GeoJson data represent one data point where each tourist is located, wherein the stay time is one dimension, the frequency is the other dimension, the stay time and the frequency data are subjected to standardized processing, and the data in the two dimensions are ensured to have the same scale; then using K-means algorithm to make cluster analysis, and dividing tourist into different clusters according to residence time and frequency data, and according to specific condition, every cluster represents a group of tourist whose residence time and frequency mode are similar; for each cluster, an average value of the internal residence time and frequency is calculated as an index for measuring the residence time and frequency of guests in the cluster, with a higher average value representing the group of guests whose cluster represents a popular attraction.

Specifically, the standardized processing was performed by Python program using standard scaler function of sklearn library, specifically expressed by the following formula:

scaler＝StandardScaler()

X scaled＝scaler.fit transform(X)

Y scaled＝scaler.fit transform(Y)

in the above expression, X represents a residence time set, Y represents a frequency set, X scaled represents a residence time data set after normalization processing, and Y scaled represents a frequency data set after normalization processing.

According to the residence time and the residence frequency of tourists, the specific steps for calculating the scenic spot popularity degree by using a K-means algorithm are as follows:

(1) Taking out the residence time and frequency data of tourists from the cluster, normalizing the data by the normalization method, and preparing the data;

(2) Determining the number K of clusters, and determining the proper number K of clusters by using indexes such as an Elbow method analysis and the like;

(3) Performing K-means cluster analysis, wherein the objective function of the K-means algorithm is represented by the following formula:

in the above, x _i Representing the sample point, c _i Represents the center point of the cluster to which the sample belongs, d being a distance metric function.

The method specifically comprises the following steps:

the first step, randomly initializing the center points of K clusters, calculating the distance between each sample and each center point, and distributing the samples to the cluster where the nearest center point is located.

The distance between each sample and its center point is calculated by:

d(x,y)＝sqrt((x ₁ -x ₂ ) ² +(y ₁ -y ₂ ) ² )

in the above formula, sqrt represents square root sign, x ₁ 、y ₁ Respectively representing the longitude and latitude of the sample, x ₂ 、y ₂ Representing the longitude and latitude, respectively, of the center point.

in the above, x _i Sample points representing the ith cluster, n _i Representing the number of samples for the i-th cluster.

Third, repeating the above two steps until reaching a stop condition, wherein the stop condition comprises that the samples in the cluster are not changed any more

(4) And analyzing the clustering result, and calculating statistical indexes such as average indexes and variances of the residence time and the frequency of each cluster, so as to calculate the size of the clusters and the distance between the clusters.

Specifically: the cluster size is calculated by the following formula:

J _i ＝∑(d(x _i ,c _i )) ²

in the above, x _i Representing the sample point, c _i Represents the center point of the cluster to which the sample belongs, and d represents the distance metric function.

(5) Determining the hot degree of scenic spots according to the sizes of the clusters and the distances among the clusters, defining the clusters with long stay time and high frequency as a group with high hot degree, defining the clusters with short stay time and low frequency as a group with low hot degree, defining the clusters with long stay time exceeding 3 hours, and conversely defining the clusters with short stay time; more than 3 times with high frequency and vice versa with low frequency.

Specifically, based on cluster size J _i Inter-cluster distance d (x) _i ,c _i ) The residence time st, frequency fr, the scenic spot popularity is calculated by:

when the polarity is more than or equal to 0.25, the scene is a hot scene;

when the polarity is less than 0.1, the scene point is a cold scene point.

S203, generating a hot line, and calculating an optimal route by using a graph algorithm in combination with roaming data of the hot spots and the users, so as to generate a cross-province hot travel line.

The specific steps for calculating the optimal route using the graph algorithm are as follows:

(1) And calculating the distance between all adjacent scenic spots in the related range, and obtaining the popularity degree data of the scenic spots.

(2) And constructing a graph model, wherein each scenic spot is taken as a node in the graph, and the distance between the nodes is taken as the weight of the edge.

(3) The starting point and the end point, and the number of points to be passed through are determined.

(4) And calculating the shortest path, namely the optimal path by using a Dijkstra algorithm, specifically, taking a certain node as a starting point, and obtaining the shortest distance between the certain node and any point by using the Dijkstra algorithm.

Taking Guangzhou as an example, selecting a plurality of scenic spots as shown in fig. 2, taking a 'lotus island' scenic spot as a starting point, taking a 'from a chemical stream hot spring' as an end point, inserting 5 scenic spots in the middle, namely A, B, C, D, E, confirming weight values among each scenic spot, constructing a Dijkstra algorithm schematic diagram (shown in fig. 2), wherein the numerical values on the line segments in fig. 2 represent distance weight values of the scenic spots at two ends, after the numerical value on each line segment is confirmed, calculating the value of each node from the starting point, taking the fact that a route is not arranged between the two points as + -infinity calculation, fixing the minimum value in the map each time, gradually advancing until a final value is confirmed, wherein the path generating the minimum value is the shortest path, and the numerical value at the end point is the distance of the shortest path.

By the method, the optimal route from the lotus islands to the Kaolin spa is calculated (the lotus islands-C-D-Kaolin spa), the distance weight value is 12, and a plurality of travel routes are generated according to the arrangement of the distance weight value from high to low.

And removing the scenic spots of the cold doors in the optimal path, adding hot scenic spots near the optimal path, namely connecting the hot scenic spots within 3 km from the scenic spots related in the optimal path to generate a hot line.

S204, performing personalized recommendation on the generated popular route according to interests and preferences of the user, matching the generated popular route according to historical roaming track data of the user, calculating a cross-provincial travel route suitable for the user, and finally providing travel route recommendation meeting the user requirements, wherein the method comprises the following steps:

(1) The roaming trail data is extracted from the database, including information of past travel destination (scenic spot), stay time, travel trail, etc. of the user.

(2) And (3) according to the historical roaming data of the user and the travel hot line generated in the step (S203), matching the travel destination of the user with scenic spots contained in the line, and calculating the hot travel line matched with the user.

(3) Generating travel routes of interest to the user, and recommending the appropriate travel routes for the user by combining the preference of the user with a recommendation algorithm, wherein the recommendation algorithm is a collaborative filtering algorithm. The method specifically comprises the following steps of

1) And calculating the similarity between the users, wherein the similarity is calculated by using cosine similarity, and setting two users respectively represented by i and k, wherein the similarity between the two users is s (i and k).

2) The travel route score for each user is calculated, for each user their score for each travel route is calculated, and user i's score for travel route j is set to r (i, j).

3) Predicting a travel route score for user k similar to user i, expressed by:

in the above equation, p (k, j) represents the predictive score of user k for tour j, s (i, k) represents the similarity between user i and user k, and r (i, j) represents the score of user i for tour j.

J ^* ＝max(p(k,j))

S3, collecting user feedback information, namely user satisfaction, scoring and the like, of the recommended route, wherein the feedback information is used for optimizing and improving a recommendation algorithm and providing more accurate and personalized travel route recommendation, and specifically comprises the following steps of:

(1) Announcements are published on the web site of the travel platform informing the user that feedback information may be provided through a specific mailbox, telephone, or online form.

(2) A questionnaire is formulated containing questions about the travel recommended route, such as satisfaction, recommended level, price justification, spot richness, tour guide explanation level, etc., to facilitate feedback provided by the user.

(3) And (3) sorting and analyzing the collected user feedback data to find out the favorites and demands of the user on the travel recommended route, and the problems and improvement places.

(4) And the analysis result is released through the website of the travel platform in the form of a report, so that the user can know the advantages and disadvantages and the improvement direction of the travel recommended line.

Through the steps, feedback information of the user, including satisfaction, scores and the like, is conveniently collected, so that a travel recommendation line is better optimized, and the user satisfaction is improved.

Finally, it should be noted that the above description is only for illustrating the technical solution of the present invention, and not for limiting the scope of the present invention, and that the simple modification and equivalent substitution of the technical solution of the present invention can be made by those skilled in the art without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. The cross-province travel hot line recommending method based on roaming information is characterized by comprising the following steps of:

2. The method for recommending a cross-province travel hot line based on roaming information according to claim 1, wherein in step S202, a K-means algorithm is adopted to perform cluster analysis on the hot degree of scenic spots, and tourists are divided into different clusters according to residence time and frequency, and the method specifically comprises the following steps:

J＝∑(d(x _i ,c _i )) ²

3. The method for cross-province travel hot line recommendation based on roaming information according to claim 2, wherein the cluster size is calculated by the following formula:

J _i ＝∑(d(x _i ，c _i )) ²

the scenic spot popularity is calculated by the following formula:

in the above, the polarity is the hot degree of the scenic spot, J _i Is the size of the cluster, d (x _i ，c _i ) For the distance between clusters, st denotes the residence time, fr denotes the frequency, specifically:

when the polarity is more than or equal to 0.25, the scene is a hot scene;

when the polarity is less than 0.1, the scene point is a cold scene point.

4. The method for provincial travel hot line recommendation based on roaming information according to claim 2, wherein S2023 specifically comprises the steps of:

the distance between each sample and its center point is calculated by:

d(x，y)＝sqrt((x ₁ -x ₂ ) ² +(y ₁ -y ₂ ) ² )

5. The method for cross-province travel hot line recommendation based on roaming information according to claim 2, wherein the optimal route in step S203 is obtained by:

s2034, calculating the best path using Dijkstra algorithm.

6. The method for recommending a cross-province travel hot line based on roaming information according to claim 5, wherein the method for calculating the optimal path by Dijkstra algorithm is as follows: starting from the starting point, the values to each node are calculated separately, no route between two nodes is calculated as + -infinity, each time the minimum value in this graph is fixed, the step-by-step forward progress is carried out until a final value is determined, the path generating the minimum value is the optimal path, and the value at the end point is the distance of the optimal path.

7. The method for provincial travel hot line recommendation based on roaming information of claim 1, wherein S204 specifically comprises the steps of:

8. The method for provincial travel hot line recommendation based on roaming information according to claim 7, wherein S2043 is specifically performed by:

3) Predicting a travel route score for user k similar to user i, expressed by:

J ^* ＝max(p(k,j))

9. The method for recommending a cross-province travel hot line based on roaming information according to claim 1, wherein the method for analyzing and processing the stay points and the tracks by the geographic information system tool is as follows: the track data in the roaming track data is input into a GIS to generate a Shapefile vector data file, the Shapefile is converted into GeoJson format data through python codes, relevant information is extracted from the GeoJson data, corresponding scenic spots are found through longitude and latitude, and stay time and frequency information are extracted.

10. The method for recommending a cross-province travel hot line based on roaming information according to claim 1, wherein S3 specifically comprises the following steps: