CN110532464B - Tourism recommendation method based on multi-tourism context modeling - Google Patents

Tourism recommendation method based on multi-tourism context modeling Download PDF

Info

Publication number
CN110532464B
CN110532464B CN201910743597.1A CN201910743597A CN110532464B CN 110532464 B CN110532464 B CN 110532464B CN 201910743597 A CN201910743597 A CN 201910743597A CN 110532464 B CN110532464 B CN 110532464B
Authority
CN
China
Prior art keywords
spot
vector
context
sight
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910743597.1A
Other languages
Chinese (zh)
Other versions
CN110532464A (en
Inventor
宾辰忠
陈红亮
古天龙
常亮
李康林
梁浩宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN201910743597.1A priority Critical patent/CN110532464B/en
Publication of CN110532464A publication Critical patent/CN110532464A/en
Application granted granted Critical
Publication of CN110532464B publication Critical patent/CN110532464B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/14Travel agencies

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a travel recommendation method based on multi-travel context modeling, which comprises the following steps: collecting data and preprocessing the data, and numbering users, scenic spots and attribute data of the users; constructing a tourism sequence track and a scenic spot knowledge map; obtaining characteristic representation of visitor access behavior sequence context and scenic spot tourism attribute context through deep learning model training; fusing the multi-travel context information to obtain a final user vector and a final scenery spot vector; and calculating the similarity of the spatial distance between the user vector and each scenic spot vector to obtain the Top-K tourist scenic spot recommendation. The invention combines the user vector and the scenery spot vector respectively obtained by the visitor visiting behavior sequence context and the scenery spot tourism attribute context into the final user vector and scenery spot vector by using a vector fusion method, the feature representation effectively fuses a plurality of tourism contexts, the advanced tourism semantics in the feature representation is improved, and the recommendation effectiveness is ensured.

Description

Tourism recommendation method based on multi-tourism context modeling
Technical Field
The invention relates to the technical field of tourist attraction recommendation, in particular to a tourist recommendation method based on multi-tourist context modeling.
Background
With the continuous progress of society and the improvement of the life quality of people, more and more people choose to travel outside. However, due to the vigorous development of the tourism industry and the rising of the public tourism enthusiasm, the problem that how to screen out scenic spots liked by tourists from massive tourism information becomes urgent to solve is caused by the overload of massive tourism information provided by the mainstream tourism information service platform at present.
At present, in a traditional travel recommendation method, a behavior sequence of a tourist is mainly obtained by means of historical behavior data, and interest point recommendation is generated by means of methods such as collaborative filtering and a probability map model. In these recommendations only low-level behavior context in the user behavior is modeled, such as access frequency, access sequence, frequent pattern, etc. Without contextually incorporating the guest with the tourist features of the attraction into the recommendation process. For example: the geographic location of the attraction, the type of travel of the attraction, the season appropriate for play, the length of play, the rating, the price of the ticket, and so forth. These contexts are particularly important for personalized travel recommendations.
Disclosure of Invention
In view of the above, the present invention provides a method for recommending tourism based on modeling of multiple tourism contexts, which constructs a visitor access behavior sequence context and a scenic spot tourism attribute context by using internet data, and integrates multiple pieces of tourism context information into a recommendation process by using a method for modeling by combining the visitor access behavior sequence context and the scenic spot tourism attribute context, so as to achieve the purpose of personalized tourism recommendation.
The invention solves the technical problems by the following technical means:
a travel recommendation method based on multi-travel context modeling is characterized by comprising the following steps:
collecting travel note data and tourist attraction attribute data of tourists as original data, preprocessing the original data, and numbering the tourists, the attractions and attributes thereof in the original data;
extracting the name of the scenic spot from the travel record data, constructing the tourist sequence track of the tourist, extracting the entity, attribute and attribute value of the tourist spot from the attribute data of the tourist spot, constructing a scenic spot knowledge map, and digitizing the data in the tourist sequence track and the scenic spot knowledge map;
mapping the tourism sequence track and the scenery spot knowledge map into a feature vector space by using a deep learning model to obtain feature representation of tourist access behavior sequence context and feature representation of scenery spot attribute context;
learning the characteristic representation of the visitor access behavior sequence context of each visitor into a user vector, and learning the sight spot vector of each sight spot through the characteristic representation of all the visitor access behavior sequence contexts;
learning to obtain each sight spot vector of the sight spot tourism attribute context according to the feature representation of the sight spot attribute context, and representing the user vector of the sight spot tourism attribute context by the mean value of sight spot vectors corresponding to historical sight spots accessed in the user tourism sequence track in the sight spot tourism attribute context;
Fusing user vectors and sight spot vectors respectively obtained by visitor access behavior sequence context and sight spot tourism attribute context to obtain a user vector and a sight spot vector containing multi-tourist context information;
and performing similarity calculation on the obtained user vectors containing the multi-tourist context information and the sight spot vectors, sequencing similarity measurement values of each user vector relative to all sight spot vectors, selecting tourist attractions K before ranking according to the measurement values, and generating Top-K tourist attraction recommendation for the user.
Further, collecting the tourist data and the attribute data of the tourist attractions of the tourist as original data, preprocessing the original data, numbering the tourist, the attractions and the attributes thereof in the original data, and specifically comprising:
crawling tourist historical tourist sight spot sequences from a tourist website by using a crawler tool;
and uniformly numbering the attributes of the tourists, the scenic spots and the scenic spots, and respectively setting unique ID representations for the attributes of the tourists, the scenic spots and the scenic spots.
Further, the sight attributes include sight location, sight type, suitable play season, sight fare, open time, and average visitor score.
Further, all the collected single scenic spots and the attribute data thereof are expressed in the form of a triple (P, V, Q); and P is a scenery entity, V is a scenery attribute, and Q is an attribute value, so that the scenery knowledge map is constructed.
Further, training a travel sequence track through a Traj2vec model, optimizing feature representation of the travel sequence track, and obtaining feature representation of the context of the visitor access behavior sequence;
the scenic spot knowledge map is trained through the TKG2vec model, the feature representation of the scenic spot knowledge map is optimized, and the feature representation of the scenic spot tourism attribute context is obtained.
Further, a travel sequence track is trained through a Traj2vec model, the characteristic representation of the travel sequence track is optimized, and the characteristic representation of the context of the visitor access behavior sequence is obtained, and the method specifically comprises the following steps:
extracting the ID of the scenic spot from the digital tourism sequence track;
constructing a user number matrix and a scenery spot number matrix, and initializing;
and training the user number matrix and the sight spot number matrix by using a Traj2vec model to generate a user vector and a sight spot vector of the visitor access behavior sequence context.
Further, the scenic spot knowledge map is trained through the TKG2vec model, the feature representation of the scenic spot knowledge map is optimized, and the feature representation of the scenic spot tourism attribute context is obtained, and the method specifically comprises the following steps:
extracting a scenery spot entity ID, an attribute ID and an attribute value ID from a scenery spot knowledge map in a digital form;
Constructing a scenery spot entity, an attribute and an attribute value vector, and initializing;
generating neighbor nodes of the nodes in the scenic spot knowledge map by using a random walk mode;
and training the scenery entities, attributes and attribute value vectors by utilizing the TKG2vec model to generate user vectors and scenery vectors of scenery tourism attribute contexts.
Further, calculating the spatial distance similarity of the user vector and the candidate sight spot vector by utilizing cosine similarity;
and sequencing the similarity metric values to obtain Top-K recommendation of tourist attractions of the user.
The invention has the beneficial effects that:
1. the method utilizes internet data to construct multi-tourism context information, wherein the multi-tourism context information comprises a tourist access behavior sequence context and a scenic spot tourism attribute context, the tourist access behavior sequence context comprises a geographic position and scenic spot access sequence semantics, the scenic spot tourism attribute context comprises scenic spot types, entrance ticket prices, suitable playing seasons, playing duration, rating levels, affiliated areas and other semantics, and the rich context semantic information enhances the tourism individuation and accuracy of a recommendation result;
2. according to the invention, efficient modeling is respectively carried out on the tourism sequence track and the scenic spot knowledge map by using the Traj2vec model and the TKG2vec model, so that the characteristic representation of the tourist access behavior sequence context and the scenic spot tourism attribute context is obtained, thus not only the characteristics of original data are reserved, but also the storage space and the calculation complexity are simplified, and the characteristic representation of multiple tourism contexts can be more accurately and reasonably combined with a tourism recommendation system;
3. The invention combines the user vector and the scenery spot vector respectively obtained by the visitor visiting behavior sequence context and the scenery spot tourism attribute context into the final user vector and scenery spot vector by using a vector fusion method, the feature representation effectively fuses a plurality of tourism contexts, the advanced tourism semantics in the feature representation is improved, and the recommendation effectiveness is ensured;
4. the method is based on the feature representation of the user and the scenic spot fusing multiple tourist contexts, utilizes the efficient vector space distance similarity to calculate the similarity between the user vector and the scenic spot vector, and can generate effective scenic spot recommendation.
Drawings
FIG. 1 is an overall flow chart of travel recommendation based on multi-travel context modeling provided by an embodiment of the present invention;
FIG. 2 is a flow chart of data acquisition and pre-processing provided by an embodiment of the present invention;
FIG. 3 is a flowchart illustrating vectorization of a sequence of guest access behaviors according to an embodiment of the present invention;
FIG. 4 is a flow chart of scenic spot tourist attribute vectorization provided by an embodiment of the present invention;
FIG. 5 is a flow chart of a method for fusing two travel context characteristics according to an embodiment of the present invention;
fig. 6 is a flowchart of a method for generating a sight recommendation by using a feature vector according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail below with reference to the following figures and specific examples:
as shown in fig. 1, the method for recommending travel based on multi-travel context modeling of the present invention specifically includes:
s1, collecting tourist data and tourist attraction attribute data of tourists as original data provided by the embodiment of the invention, preprocessing the original data, and numbering the tourists, the attractions and the attributes thereof in the original data;
s2, extracting sequences of tourists who have visited scenic spots from the collected travel notes, such as 'Xiangshan scenic spot → Jingjiang king city → Duocan → Qixing scenic spot → Shuangjiang four lakes'. And expressing scene entities, attributes and attribute values in the collected individual scenic spots and related attribute data thereof into a triple form, so as to construct the scenic spot knowledge map. For example: the three-dimensional scene can be represented by a triple < the scene, the region to which the scene belongs, and the mountain area >, wherein the scene is a scene entity, the region to which the scene belongs is a scene attribute, and the mountain area is an attribute value.
And S3, splicing all the travel sequence tracks of the same user together to form the final travel sequence track of the user by taking the user as a unit, and mapping the travel sequence track and the scenic spot knowledge map to a feature vector space by utilizing a deep learning model. Training the travel sequence track through a Traj2vec model, optimizing the feature representation of the travel sequence track, and finally obtaining the feature representation of the context of the visitor access behavior sequence; training the scenic spot knowledge map through the TKG2vec model, optimizing feature representation of the scenic spot knowledge map, and finally obtaining feature representation of scenic spot tourism attribute context;
S4 learning the characteristic representation of the sequence context of the visitor' S visit behavior of each visitor as a user vector
Figure BDA0002164826870000061
Learning the sight vector of each sight by comprehensively considering the characteristic representation of all visitor access behavior sequence contexts
Figure BDA0002164826870000062
For example, from travel notes written by 800 different tourists, 1000 travel sequence tracks (the same tourist may write a plurality of travel notes) are collected, wherein 100 different sights are included, and finally, 800 user vectors of the context of the visit behavior sequence of the tourists are learned
Figure BDA0002164826870000063
Sight point vector of 100 visitor access behavior sequence context
Figure BDA0002164826870000064
Learning to obtain scenic spot vector of scenic spot tourism attribute context according to feature representation of scenic spot tourism attribute context
Figure BDA0002164826870000065
And expressing the user vector of the scenery spot tourism attribute context by the mean value of the scenery spot vector of the scenery spot tourism attribute context corresponding to the scenery spot of the final tourism sequence track of the user
Figure BDA0002164826870000066
For example, user u1The final scenic spots included in the tour sequence track are p in sequence1,p2,p3,p4,p5The sight spot vector of the corresponding sight spot tourism attribute context is
Figure BDA0002164826870000067
Then user u in the context of the sight spot travel attribute1Is a user vector of
Figure BDA0002164826870000068
User directions derived from visitor access behavior sequence context and sight spot tourism attribute context respectively Fusing the vector and the sight spot vector to obtain a user vector and a sight spot vector which contain multiple pieces of travel context information;
and S5, calculating the spatial distance similarity between each user vector and all sight spot vectors by using the user vectors and sight spot vectors containing the multi-tourist context information obtained in the step S4, sequencing the obtained measurement values of the spatial distance similarity between each user vector and all sight spot vectors, and selecting the tourist attractions K before ranking to obtain Top-K tourist attraction recommendation.
Specifically, fig. 2 shows a flowchart of data acquisition and preprocessing in this example, and the specific steps include:
l1, collecting travel notes of users from travel websites such as journey taking and hornet nest by using a crawler tool, sequentially extracting scene name information in the travel notes, and collecting each tourist attraction and attribute information of the scenic spots in journey taking and encyclopedia, wherein the attributes of the scenic spots comprise: the positions of the scenic spots, the types of the scenic spots, the suitable playing seasons, the fare of the scenic spots, the opening time, the average scores of the tourists and the like. The data of the single scenic spot extracted through the operation is relatively disordered, and the data lack of connection and fusion of context information, so that the requirement of subsequent calculation cannot be met, and the data needs to be preprocessed. For example, the scenic spot fare is collected by specific money amounts such as 30, 50 or 120, and when the money amounts are stored as the scenic spot attributes, the scenic spot attributes are divided into three grades, namely high grade, medium grade and low grade, specifically: more than 100 yuan is high grade, 50 yuan to 100 yuan is middle grade, and less than 50 yuan is resisting grade; for example, the geographical position of a scenic spot, the collected address is a specific street number, and when the address is stored as the attribute of the scenic spot, the geographical position is divided according to the administrative region;
L2, the data preprocessed in the step L1 can not be directly used for subsequent processing, all collected users, scenic spots and attributes thereof are numbered uniformly, and unique IDs are used for representing the users, the names of the scenic spots, the attributes of the scenic spots and the attribute values of the scenic spots, so that the subsequent further processing is facilitated, and the anonymization processing can be performed on the user information. For example: the ID of the user "202001" is set to 0, and the ID of the user "202002" is set to 1; the ID of the scenic spot "seven stars scenic spot" is set to 0, and "Jingjiang princess" is set to 1; the ID of the sight spot attribute "area to which it belongs" is set to 0, and the "sight spot type" is set to 1; the ID of the scenic spot attribute value 'seven-star zone' is set to be 0, and 'elephant mountain zone' is set to be 1; by analogy, anonymized data can be obtained when the tour sequence track is represented;
l3 extracting sequences of tourist attractions from collected tourist notes, and splicing the sequences of tourist attractions in all the tourist notes of the same tourist together to form the tourist sequence track of the tourist. And expressing the collected data of all single scenic spots, attributes of the single scenic spots and the like in a form of a triple (P, V, Q), wherein P is a scenic spot entity, V is a scenic spot attribute, and Q is an attribute value, so as to construct the scenic spot knowledge map. For example: the seven-star scenic spot is positioned in the seven-star scenic spot, wherein the seven-star scenic spot is a scenic spot entity P, the affiliated area is a scenic spot attribute V, and the seven-star scenic spot is an attribute value Q;
L4, extracting the scenic spots and the attributes thereof in the scenic spot knowledge maps and the scenic spots in the tourism sequence tracks obtained in the step L3, converting the scenic spots and the attributes thereof in the scenic spot knowledge maps and the attributes thereof in the tourism sequence tracks into digital forms according to the scenic spot ID table, the attribute ID table and the attribute value ID table in the step 2, and storing the digital forms by texts with names of sequence and triple respectively, wherein each triple in each tourism sequence track and scenic spot knowledge map occupies one row, for example, the three groups in the tourism sequence tracks and the scenic spot knowledge maps are converted into the digital form (P Hill scenic region → Jingjiang Wang City → … … → seven stars scenic region)1,P2,...P3) One line is occupied in the sequence text; 'the Xiangshan scenic spot is located in the Xiangshan district' and is converted into a digital form of (P)1,V1,Q1) One line is occupied in the triple text.
Because the digital tourism sequence tracks and the scenic spot knowledge maps generated by the steps cannot effectively perform effective feature fusion on the tourism contexts, the tourism contexts need to be converted into a vector form by using a deep learning model. The invention respectively uses the Traj2vec model and the TKG2vec model in the processes of tourist access behavior sequence context vectorization and scenic spot tourism attribute context vectorization. The Traj2vec model is based on a distributed representation model doc2vec, which is a natural language processing model used for mining the sequence semantics of words in a text. And respectively representing the tourism sequence track and the scenic spots in the track as corresponding feature vectors through a Traj2vec model, and training to obtain a user vector and a scenic spot vector which contain the contextual information of the visitor access behavior sequence. A flowchart of context vectorization of a guest access behavior sequence is shown in fig. 3, and the specific steps include:
M1, extracting the IDs of all scenic spots from the constructed tourism sequence tracks in each digital form;
m2 using a low-dimensional sequence feature vector d to the constructed digital travel sequence trackiExpressing, arranging the sequence characteristic vectors of all the user tour sequence tracks in sequence to form a user number matrix D, wherein the tour sequence track of each user is expressed as a column in the matrix D, and the index number of the tour sequence track corresponds to the column number of the matrix D; using a low-dimensional scenic spot feature vector w for each of all the scenic spot IDs extracted in the step 1iExpressing, and then expressing by sight spot feature vector w corresponding to sight spot ID in the travel sequence track of all usersiAnd forming a scenery spot number matrix W in sequential arrangement, wherein the scenery spot feature vector of each scenery spot ID is a column of the matrix W. Initializing a user number matrix D and a scenery spot number matrix W by using normally distributed random values;
m3, training a user number matrix D and a sight spot number matrix W through a Traj2vec model, wherein the initial objective function is as follows:
Figure BDA0002164826870000091
wherein,
Figure BDA0002164826870000092
wtrepresenting the corresponding sight point vector, w, in the current tour sequence trackt-k,...,wt+kIs wtA corresponding contextual sight vector. H () function of formula (2)The number represents the user travel sequence track vector D in the matrix D iAnd W in the matrix WtThe sum of sight context vectors.
Figure BDA0002164826870000093
θuRespectively represent sight spot vectors wtOne (auxiliary) vector corresponding to the sight point vector u is the parameter to be trained.
Figure BDA0002164826870000094
Denotes a context as wt-k,...,wt+kThen, predict the sight vector as wtThe probability of (a) of (b) being,
Figure BDA0002164826870000095
then the context is denoted as wt-k,...,wt+kThen the probability of the sight vector being u is predicted. NEG (w)t) Is about wtThe principle of the negative sampling method is that all the scenic spots in the scenic spot number matrix are represented as [0, 1 ] according to the occurrence frequency of the scenic spots]And in the first section, all the line segments are spliced together end to form a unit line segment with the length of 1, and negative sampling is carried out by randomly dotting the unit line segment. The meaning of the whole formula is that a user number matrix D is used as a paragraph matrix, a scenic spot number matrix W forms a word matrix, and the user number matrix D and the scenic spot number matrix W are updated by maximizing average logarithmic probability, namely maximizing the value of an objective function of the formula (1), so that the user number matrix D and the scenic spot number matrix W contain guest access behavior sequence context information;
m4. after training by the Traj2vec model, each column in the obtained user number matrix D is used as a user vector of the context of the visitor access behavior sequence
Figure BDA0002164826870000096
Wherein i represents the number of each user and is also the column number of the user number matrix D; taking each column in the obtained scenery spot number matrix W as a scenery spot vector of the context of the visitor access behavior sequence
Figure BDA0002164826870000097
Where j represents the number of the attraction and is also the column number of the attraction number matrix.
In addition, the TKG2vec model is based on the node2vec model, and the model learns the attribute and result information of the graph nodes by using a random walk strategy and a CBOW model. The scenic spot knowledge map is converted into a vector form through a TKG2vec model, a user vector containing scenic spot tourism attribute context information and a scenic spot vector are obtained through training, a scenic spot tourism attribute context vectorization flow chart is shown in FIG. 4, and the specific steps comprise:
n1, extracting the scenery spot entity ID, the attribute ID and the attribute value ID from the constructed digital scenery spot knowledge map;
n2, expressing the ID, the attribute ID and the attribute value ID of each scenery spot entity extracted in the step N1 by using a low-dimensional real value vector, and initializing each dimension of the vector through a random value of normal distribution to obtain the scenery spot entity vector, the attribute vector and the attribute value vector;
n3, obtaining neighbor nodes of the nodes in the scenic spot knowledge graph by utilizing the biased random walk mode of TKG2vec, wherein the transition probability among the nodes in the scenic spot knowledge graph is calculated in the following mode:
Figure BDA0002164826870000101
Wherein,
πvx=αpq(t,x)·ωvx (4)
Figure BDA0002164826870000102
cirepresenting the ith node in the course of walking, Z being an integer for normalization, (v, x) representing an edge from node v to node x, E representing the set of edges in the scenic spot knowledge-graph, t representing the node preceding node v, ωvxFor the weight of the edge in the knowledge map of the scenic spot, use the corresponding pivxTo initialize ωvx(i.e. omega)vx=πvx),dtxThe length of the shortest path between the node t and the node x is represented, and p and q are used for controlling the walking speed and the distance from the initial node; generating scenic spot tourism attribute context in the scenic spot knowledge map by the random walk mode combining depth-first search and breadth-first search;
n4, the scenic spot entities, the attributes and the attribute value vectors are trained through the TKG2vec model, and the initial objective function is as follows:
maxfu∈VlOgPr(Ns(u)|f(u)) (6)
wherein,
Figure BDA0002164826870000111
Figure BDA0002164826870000112
formula (6) can be converted to:
Figure BDA0002164826870000113
wherein,
Zu=∑v∈Vexp(f(u)·f(v)) (10)
f represents the mapping from the scenery spot entity, attribute and attribute value ID to the corresponding scenery spot entity, attribute and attribute value vector, u belongs to V and represents the node in the scenery spot knowledge map, f (u) is the vector of the node u, Ns(u) is a neighbor node of the node u, and equations (7) and (8) are two standard assumptions given for the optimization problem of the objective function to be easily solved: conditional independence and symmetry of feature space. The conditional independence assumption is that after a vector of a node is assumed, the probabilities of discovering a neighbor node of the node and discovering other neighbor nodes are mutually independent; symmetry of feature space refers to a node and its neighbors The nodal points have symmetry in the feature space. The meaning of the whole formula is that according to the scenery spot entity vector, the attribute vector and the attribute value vector in the known scenery spot knowledge map, the logarithmic probability of the network neighbor in the scenery spot knowledge map is found to be maximized, namely the value of the objective function represented by the formula (6) or the converted formula (9) is maximized, so that the scenery spot entity vector, the attribute vector and the attribute value vector in the scenery spot knowledge map are updated, and the scenery spot entity vector, the attribute vector and the attribute value vector contain scenery spot tourism attribute context information;
l5 after training as above through TKG2vec model, we obtain the scenery entity vector in the scenery knowledge map, which is used as the scenery vector of scenery tourism attribute context
Figure BDA0002164826870000114
Wherein i represents the number of each sight; then obtaining the feature vector of the scenery spot tourism attribute context of the user according to the mean value of the scenery spot vectors corresponding to the historical scenery spots visited in the tourism sequence track of the user in the scenery spot knowledge map
Figure BDA0002164826870000115
Figure BDA0002164826870000116
After the visitor access behavior sequence context and the scenic spot tourism attribute context vectorization, modeling of multiple tourism contexts is completed, then multiple tourism context information needs to be fused, and a flow chart for fusing two types of tourism context characteristics is shown in fig. 5 and specifically comprises the following steps:
P1 user vector of visitor access behavior sequence context obtained by training Traj2vec model
Figure BDA0002164826870000121
User vector with scenic spot travel attribute context trained by TKG2vec model
Figure BDA0002164826870000122
Spliced together as the final User vector Useri(ii) a For example, user vectors
Figure BDA0002164826870000123
And
Figure BDA0002164826870000124
when all the dimensions of the User vector are 80 dimensions, the User vector User obtained by splicingiThe dimension of (2) is 160 dimensions;
Figure BDA0002164826870000125
p2 scenic spot vector of visitor access behavior sequence context obtained by training Traj2vec model
Figure BDA0002164826870000126
Scenic spot vector associated with scenic spot travel attribute context trained by TKG2vec model
Figure BDA0002164826870000127
Spliced together as the final sight point vector Attractioni
Figure BDA0002164826870000128
The final User vector User obtained by the above stepsiAnd sight point vector extractioniThe method comprises a plurality of pieces of tourism context information, the scenic spot recommendation can be performed next, and a flow chart of the method for generating the scenic spot recommendation by utilizing the feature vector is shown in fig. 6, and specifically comprises the following steps:
q1 final User vector User obtained by fusing multi-travel context characteristicsiAnd sight point vector extractioniThe cosine similarity calculation is performed by the following formula:
Figure BDA0002164826870000129
wherein xiRepresenting a certain User vector UseriThe value of (i), yiRepresented as a certain sight point vector extraction iThe value of the ith dimension of (a); for example, User vector User1=(x1,x2,x3) Vector of Attraction1=(y1,y2,y3) Then User1Heel traction1Has a cosine similarity of
Figure BDA00021648268700001210
In this example, if the cosine similarity metric of the user vector and the sight spot vector is smaller, it indicates that the possibility that the user visits the sight spot is smaller, and the larger the similarity metric is, it indicates that the user is more likely to visit the sight spot;
q2 calculating each User vector User by cosine similarity formulaiAnd all candidate sight point vectors extractioniThe similarity of (2);
and (3) for each user, ordering the cosine similarity metric values of the user vector and each sight spot vector obtained by calculation in the step (1), and then selecting the sight spots with K Top in ranking to obtain Top-K recommendation about the sight spots of the user. For example, if there are 3 users and 100 scenic spots, and K is selected to be 10, cosine similarity between the vector of the user 1 and the vectors of the 100 scenic spots is calculated, the obtained 100 cosine similarity metric values are sorted, and the scenic spots with the Top 10 ranking are selected, that is, the Top-10 scenic spot recommendation about the user 1 is obtained, as is the Top-10 scenic spot recommendation of the users 2 and 3.
Although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims. The technology, the shape and the construction part which are not described in detail are the known technology provided by the embodiment of the invention.

Claims (6)

1. A travel recommendation method based on multi-travel context modeling is characterized by comprising the following steps:
collecting travel note data and tourist attraction attribute data of tourists as original data, preprocessing the original data, and numbering the tourists, the attractions and attributes thereof in the original data;
extracting sequences of tourists who have visited scenic spots from the travel record data, extracting entities, attributes and attribute values of the scenic spots from attribute data of the scenic spots, constructing a scenic spot knowledge map, and digitizing data in the tourism sequence tracks and the scenic spot knowledge map;
mapping the tourism sequence track and the scenic spot knowledge map into a feature vector space by using a deep learning model to obtain the feature representation of the tourist access behavior sequence context and the feature representation of the scenic spot tourism attribute context, which specifically comprises the following steps: training the travel sequence track through a Traj2vec model, optimizing the feature representation of the travel sequence track, and obtaining the feature representation of the context of the visitor access behavior sequence;
training the scenic spot knowledge map through the TKG2vec model, optimizing feature representation of the scenic spot knowledge map, and obtaining feature representation of scenic spot tourism attribute context;
training a travel sequence track through a Traj2vec model, optimizing feature representation of the travel sequence track, and obtaining feature representation of a visitor access behavior sequence context, wherein the method specifically comprises the following steps:
Extracting the ID of the scenic spot from the digital tourism sequence track;
using a low-dimensional sequence feature vector d to construct a digital travel sequence trackiExpressing, arranging the sequence characteristic vectors of all the user tour sequence tracks in sequence to form a user number matrix D, wherein the tour sequence track of each user is expressed as a column in the matrix D, and the index number of the tour sequence track corresponds to the column number of the matrix D; using a low-dimensional scenery spot feature vector for each of all the scenery IDs extractedwiExpressing, and then expressing by sight spot feature vector w corresponding to sight spot ID in the travel sequence track of all usersiForming a scenery spot number matrix W in sequence arrangement, wherein the scenery spot feature vector of each scenery spot ID is a column of the matrix W; initializing a user number matrix D and a scenery spot number matrix W by using normally distributed random values;
training a user number matrix D and a sight spot number matrix W through a Traj2vec model, wherein an initial objective function is as follows:
Figure FDA0003504649830000011
wherein,
Figure FDA0003504649830000021
wtrepresenting the corresponding sight point vector, w, in the current tour sequence trackt-k,...,wt+kIs wtA corresponding contextual sight vector; the hO function in the formula (2) represents a user travel sequence track vector D in a matrix DiAnd W in the matrix W tThe sum of the sight context vectors;
Figure FDA0003504649830000022
θurespectively represent sight spot vectors wtOne vector corresponding to the scenery spot vector u is a parameter to be trained;
Figure FDA0003504649830000023
denotes a context as wt-k,...,wt+kThen, predict the sight vector as wtThe probability of (a) of (b) being,
Figure FDA0003504649830000024
then the context is denoted as wt-k,...,wt+kThen, predicting the probability that the scenery spot vector is u; NEG (w)t) Is about wtThe principle of the negative sampling method is that all the scenic spots in the scenic spot number matrix are represented as [0, 1 ] according to the occurrence frequency of the scenic spots]In the first section, all the line segments are spliced together end to form a unit line segment with the length of 1, and negative sampling is carried out by randomly dotting the unit line segment; the user number matrix D is used as a paragraph matrix, the scenic spot number matrix W forms a word matrix, and the user number matrix D and the scenic spot number matrix W are updated by maximizing the average logarithmic probability and contain guest access behavior sequence context information;
after the training is carried out through the Traj2vec model, each column in the obtained user number matrix D is used as a user vector of the context of the visitor access behavior sequence
Figure FDA0003504649830000025
Wherein i represents the number of each user and is also the column number of the user number matrix D; taking each column in the obtained scenery spot number matrix W as a scenery spot vector SE of the context of the visitor access behavior sequence pjWherein j represents the number of the scenery spot and is the column number of the scenery spot number matrix;
the TKG2vec model is based on the node2vec model, and the model learns the attribute and result information of the graph nodes by using a random walk strategy and a CBOW model; learning the characteristic representation of the visitor access behavior sequence context of each visitor into a user vector, and learning the sight spot vector of each sight spot through the characteristic representation of all the visitor access behavior sequence contexts;
learning to obtain each sight spot vector of the sight spot sight attribute context according to the feature representation of the sight spot sight attribute context, and representing the user vector of the sight spot sight attribute context by the mean value of sight spot vectors corresponding to historical sight spots visited in the user sight sequence track in the sight spot sight attribute context;
fusing user vectors and sight spot vectors respectively obtained by visitor access behavior sequence context and sight spot tourism attribute context to obtain a user vector and a sight spot vector containing multi-tourist context information;
and performing similarity calculation on the obtained user vectors containing the multi-tourist context information and the sight spot vectors, sequencing similarity measurement values of each user vector relative to all sight spot vectors, selecting tourist attractions K before ranking according to the measurement values, and generating Top-K tourist attraction recommendation for the user.
2. The travel recommendation method based on multi-travel context modeling according to claim 1, characterized in that: collecting tourist data and tourist attraction attribute data of tourists as original data, preprocessing the original data, numbering the tourists, the attractions and the attributes thereof in the original data, and specifically comprising the following steps:
crawling tourist historical tourist sight spot sequences from a tourist website by using a crawler tool;
and uniformly numbering the attributes of the tourists, the scenic spots and the scenic spots, and respectively setting unique ID representations for the attributes of the tourists, the scenic spots and the scenic spots.
3. The travel recommendation method based on multi-travel context modeling according to claim 2, characterized in that: the sight attributes include sight location, sight type, season of suitable play, sight fare, open time, and average score of the guest.
4. The travel recommendation method based on multi-travel context modeling according to claim 3, characterized in that: all the collected single scenic spots and attribute data thereof are expressed in the form of a triple (P, V, Q); and P is a scenery entity, V is a scenery attribute, and Q is an attribute value, so that the scenery knowledge map is constructed.
5. The travel recommendation method based on multi-travel context modeling according to claim 1, characterized in that: the scenic spot knowledge map is trained through the TKG2vec model, the characteristic representation of the scenic spot knowledge map is optimized, and the characteristic representation of the scenic spot tourism attribute context is obtained, and the specific steps comprise:
Extracting a scenery spot entity ID, an attribute ID and an attribute value ID from a scenery spot knowledge map in a digital form;
constructing a scenery spot entity, an attribute and an attribute value vector, and initializing;
generating neighbor nodes of the nodes in the scenic spot knowledge map by using a random walk mode;
and training the scenery entities, attributes and attribute value vectors by utilizing the TKG2vec model to generate user vectors and scenery vectors of scenery tourism attribute contexts.
6. The travel recommendation method based on multi-travel context modeling according to claim 5, characterized in that:
calculating the spatial distance similarity of the user vector and the candidate sight spot vector by utilizing cosine similarity;
and sequencing the similarity metric values to obtain Top-K recommendation of tourist attractions of the user.
CN201910743597.1A 2019-08-13 2019-08-13 Tourism recommendation method based on multi-tourism context modeling Active CN110532464B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910743597.1A CN110532464B (en) 2019-08-13 2019-08-13 Tourism recommendation method based on multi-tourism context modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910743597.1A CN110532464B (en) 2019-08-13 2019-08-13 Tourism recommendation method based on multi-tourism context modeling

Publications (2)

Publication Number Publication Date
CN110532464A CN110532464A (en) 2019-12-03
CN110532464B true CN110532464B (en) 2022-04-12

Family

ID=68663081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910743597.1A Active CN110532464B (en) 2019-08-13 2019-08-13 Tourism recommendation method based on multi-tourism context modeling

Country Status (1)

Country Link
CN (1) CN110532464B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269882B (en) * 2020-10-12 2022-10-18 西安工程大学 Tourist attraction recommendation method oriented to knowledge map
CN112667877A (en) * 2020-12-25 2021-04-16 陕西师范大学 Scenic spot recommendation method and equipment based on tourist knowledge map
CN112784153B (en) * 2020-12-31 2022-09-20 山西大学 Tourist attraction recommendation method integrating attribute feature attention and heterogeneous type information
CN114936723B (en) * 2022-07-21 2023-04-14 中国电子科技集团公司第三十研究所 Social network user attribute prediction method and system based on data enhancement

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189944A (en) * 2018-09-27 2019-01-11 桂林电子科技大学 Personalized recommending scenery spot method and system based on user's positive and negative feedback portrait coding
CN109977283A (en) * 2019-03-14 2019-07-05 中国人民大学 A kind of the tourism recommended method and system of knowledge based map and user's footprint

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI512664B (en) * 2013-11-11 2015-12-11 Inst Information Industry Method and system for recommending tour attractions based on medical services

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189944A (en) * 2018-09-27 2019-01-11 桂林电子科技大学 Personalized recommending scenery spot method and system based on user's positive and negative feedback portrait coding
CN109977283A (en) * 2019-03-14 2019-07-05 中国人民大学 A kind of the tourism recommended method and system of knowledge based map and user's footprint

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"上下文感知旅游推荐系统研究综述";匡海丽等;《智 能 系 统 学 报》;20190731;第612-618页 *

Also Published As

Publication number Publication date
CN110532464A (en) 2019-12-03

Similar Documents

Publication Publication Date Title
CN110532464B (en) Tourism recommendation method based on multi-tourism context modeling
CN109977283B (en) Tourism recommendation method and system based on knowledge graph and user footprint
CN110287336B (en) Tourist map construction method for tourist attraction recommendation
Arefieva et al. A machine learning approach to cluster destination image on Instagram
CN107679661B (en) Personalized tour route planning method based on knowledge graph
CN110555112B (en) Interest point recommendation method based on user positive and negative preference learning
Jiang et al. Author topic model-based collaborative filtering for personalized POI recommendations
CN108829852B (en) Personalized tour route recommendation method
Chen et al. Personalized itinerary recommendation: Deep and collaborative learning with textual information
US20100211308A1 (en) Identifying interesting locations
CN108681586B (en) Tourist route personalized recommendation method based on crowd sensing
CN110288436A (en) A kind of personalized recommending scenery spot method based on the modeling of tourist&#39;s preference
JP2010039710A (en) Information collection device, travel guiding device, travel guiding system and computer program
CN115292599A (en) Scenic spot recommendation method integrating attribute co-occurrence and interactive behavior characteristics
CN112733040B (en) Travel itinerary recommendation method
CN107066565A (en) A kind of tourist hot spot forecasting system
CN113536155A (en) Multi-source data-based tourism route visual analysis and planning method
Tang et al. Synergizing Spatial Optimization with Large Language Models for Open-Domain Urban Itinerary Planning
CN111882381B (en) Travel recommendation method based on collaborative memory network
CN111797331A (en) Multi-target multi-constraint route recommendation method based on crowd sensing
Su et al. Personalized route description based on historical trajectories
Xu et al. Selection and visiting sequence of daily attractions: Multi-day travel itinerary recommendation based on multi-source online data
CN113112058A (en) Travel route recommendation method based on knowledge graph and ant colony algorithm
CN113515697A (en) Group dynamic tour route recommendation method and system based on multiple intentions of user
Li et al. Personal tour planning system (PTPS) for use in urban and rural areas

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20191203

Assignee: Guilin ruiweisaide Technology Co.,Ltd.

Assignor: GUILIN University OF ELECTRONIC TECHNOLOGY

Contract record no.: X2022450000191

Denomination of invention: A tourism recommendation method based on multi tourism context modeling

Granted publication date: 20220412

License type: Common License

Record date: 20221125

EE01 Entry into force of recordation of patent licensing contract