CN105930469A - Hadoop-based individualized tourism recommendation system and method - Google Patents
Hadoop-based individualized tourism recommendation system and method Download PDFInfo
- Publication number
- CN105930469A CN105930469A CN201610258743.8A CN201610258743A CN105930469A CN 105930469 A CN105930469 A CN 105930469A CN 201610258743 A CN201610258743 A CN 201610258743A CN 105930469 A CN105930469 A CN 105930469A
- Authority
- CN
- China
- Prior art keywords
- sight spot
- user
- module
- data
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/14—Travel agencies
Abstract
The invention discloses a Hadoop-based individualized tourism recommendation system and method, which belongs to the field of the internet technology and big data. Five modules are supplemented each other to finish whole system functions, wherein the five modules are independently a web crawler module, a data module, a big data processing module, a recommendation calculation module and an UI (User Interface) module and have a connection relationship that the web crawler module is in unilateral connection with a metadata module and is simultaneously in unilateral connection with the UI module; the data module is in unilateral connection with the big data processing module and is simultaneously in unilateral connection with the UI module; the big data processing module is in unilateral connection with the recommendation calculation module and is simultaneously in bidirectional connection with the UI module; and the recommendation calculation module is in bidirectional connection with the UI module. The invention develops the Hadoop-based individualized tourism recommendation system, which can quickly and accurately make individualized recommendation for tourists and brings pleasant and proper choices when the tourists select a destination.
Description
Technical field
The present invention relates to Internet technology, big data fields, data mining, the personalization for tourism industry exploitation pushes away
Recommend system.
Background technology
Traditional tour site, is that the popular degree according to sight spot is recommended mostly, not for visitor individual's
Interest and behavior carry out personalized recommendation so that when visitor selects destination in a large amount of sight spots very blindly, and be difficult to
Match personal interest point.And personalized recommendation system in other field, conventional method has based on commending contents and association
Article similar for content are always pushed away by same filtered recommendation, but both approaches all existing defects based on content recommendation method
Recommending to user so that user produces fatigue to recommendation results, it is bigger to there is popular article accounting in collaborative filtering recommending method
Problem, this can reduce the occurrence rate of long-tail article so that final recommendation not new meaning for user.At place
Technically, traditional handling process is in the face of mass data when, and processing speed is slow and efficiency is low, and this does not meets for reason
Principle rapidly and efficiently is run in website.
Summary of the invention
The present invention is directed to three problems proposed in background technology, develop personalized tourism based on Hadoop and recommend
System, the most quickly can formulate personalized recommendation for visitor, brings more comfortable suitable when selecting destination for visitor
The selection closed.
For achieving the above object, the present invention provides following technical scheme:
The present invention is with Eclipse as developing instrument, and Hadoop is big data processing platform (DPP), and Java is programming language, logical
Cross the JSCH local Windows system of cross-platform connection and server CentOS system, i.e. may be implemented in this locality and browse
On server, corresponding operating request is sent on device.By the interactive information of the page, backstage uses in Hadoop
MapReduce Computational frame, carries out substep in distributed file system and searches and calculate, and result is integrated return
Front end page.
The present invention has five modules and has complemented each other whole system function, they be respectively webcrawler module,
Data module, big data processing module, recommendation computing module, UI interface module.Their annexation is, network
Reptile module is unidirectional with meta data block to be connected, and simultaneously the most unidirectional with UI interface module is connected;At data module and big data
The reason unidirectional connection of module, is bi-directionally connected with UI interface module simultaneously;Big data processing module is unidirectional with recommendation computing module
Connect, be bi-directionally connected with UI interface module simultaneously;Computing module is recommended to be bi-directionally connected with UI interface module.Each module
Connect flow process as it is shown in figure 1, concrete connection procedure is as follows:
1. webcrawler module mainly crawls sight spot information and user profile data, and the order that crawls of sight spot information is basis
Province and urban information crawl successively, first each province urban information data in ergodic data module, and backstage is passed through
The city name of URL in amendment tour site, obtains the Cookie of this website simultaneously, obtains each city, each province successively
Under sight name list, further according to this attraction list, successively the relevant field information retrieval needed for each sight spot is gone out
Come, and record and store in sight spot information table corresponding in data base.The information data of user is according to each sight spot
Review pages obtains the information commenting on this sight spot, and obtains the commentator i.e. details of user according to review information, will
User profile and review information record respectively and store in user message table corresponding in data and evaluation table.Crawl flow process
As follows:
List of countries → province list → city list → attraction list → sight spot field information → sight spot comment → commentator
Information
Webcrawler module mainly triggers, by two approach, the program of crawling, one be every day right place to data base
Read scene data, and triggering crawls sight spot and user profile program accordingly, and result record is stored data base
In corresponding data table in.Another is to be triggered, when the sight name inquired about by the search function of UI interface module
When can not find corresponding result in data base, crawlers will be touched and go tour site to inquire about and crawl phase
Pass information, if finding corresponding sight spot, then by the relevant field information crawler at this sight spot out, and records storage to number
According to sight spot information table corresponding in storehouse, result is fed back to UI page correspondence position again simultaneously.
2. data module is mainly used to store master data information, other including three major types, be respectively sight spot master data,
User's master data, user sight spot relation data.Wherein sight spot master data comprises province list, city list, scape
Point list, each sight spot Basic Information Table;The sight spot information that user's master data comprises user basic information and user went;
User sight spot relation data comprises user's evaluating data to sight spot.On the one hand data module carries for big data processing module
For the data supporting on basis, on the other hand can therefrom inquire about information needed by UI interface search function.
3. big data processing module is MapReduce Computational frame (hereinafter referred to as MR) based on Hadoop platform
Running, this framework is broadly divided into two parts of Map and Reduce, after first being split by initial data by host node
Being distributed to each working node performing map task, each working node starts simultaneously at execution map task, when map appoints
After business terminates, using output result as the input value of reduce task, send the working node performing reduce task to,
Reduce is responsible for merging the result of map statistical disposition, and final result is integrated output.MR framework flow process
As shown in Figure 2.
The purpose of this module is to improve data processing speed, can pass through the similarity search function at UI interface, classification scape
Point word cloud function, scene types forecast function trigger and call the data in data base and carry out processing calculating, and will calculate
Result returns to the correspondence position of the UI page.Calculating mainly for following four aspect content parallel processing, one is network
The process of reptile have employed MR thought, is effectively improved and crawls speed;Two is to be applied to user's similarity and sight spot phase
Calculating aspect like degree, it is achieved at short notice user or sight spot are completed Similarity Measure, used data among these
User's master data, sight spot master data and user sight spot relation data in module.Three is to be applied at text mining
Reason, has done participle statistics respectively, and has shown corresponding dynamic class word cloud design sketch each classification sight spot information,
In addition class prediction calculating has been done at the sight spot to UNKNOWN TYPE, mainly uses sight spot master data as classification based training.
4. recommending computing module is that the result data according to big data processing module carries out specific aim recommendation calculating, and will push away
Recommend result to feed back in UI page correspondence field.This module has three big content recommendations, and one is to recommend phase for login user
Like user, the user's similarity i.e. calculated according to big data processing module sets up the similarity matrix of user and user,
Find the top ten list user the highest with this user's similarity as recommendation results;Two is content-based recommendation, according to
The sight spot gone before family, analyzes and extracts these sight spot features, as the hobby of user, in sight spot similarity
Matrix is found similar sight spot.Three is that mixing is recommended, and namely personalized recommendation, it has merged based on commending contents side
Method and collaborative filtering recommending method based on article, improve and recommend accuracy, provide the user the recommendation results being more suitable for.
Mixing recommendation method is first to form sight spot homologous factors according to user sight spot relation data, secondly by sight spot content characteristic
Being weighted in the scoring of particular user sight spot, form user behavior matrix, homologous factors is multiplied with behavioural matrix and obtains this use
The family score value to all sight spots, takes the top ten that score value is the highest, is presented to user as consequently recommended result.
5.UI interface module is the most relevant with above-mentioned 4 modules, except between webcrawler module being unidirectional triggering pass
Outside system, it is all bi-directional association with other three modules, after on the one hand being triggered in each relating module by page corresponding function
The program of platform, result of calculation is fed back to the corresponding field of the page and shows by the most each module.
UI interface module mainly has the three big pages, is the RECOMENDATION page respectively, and classification recommends the page and personalized recommendation
The page.The function having at each page is to carry out sight spot content retrieval, and i.e. one sight name of input, permissible
Show the essential information at this sight spot;Also have sight spot Similarity value inquiry, i.e. two sight name of input, can help to use
Similarity value and the analog result at the two sight spot is found at family.Hot spot recommends the page to be according in data base 1,000,000
The selection of user and evaluation information comprehensive statistics, at this page, it will show sight spot and the sight spot information of Top10,
And the whole nation tourist arrivals of each province and the optimal travelling season of each province;What classification recommended page presentation is each classification scape
Point statistics obtains front ten sight spots that the category is the most popular, and all sight spots are always divided into 27 classifications, for each classification
Do text data digging respectively, extracted the key feature word of each classification, and its frequency of occurrences of pin is to each classification
It is made that corresponding word cloud, makes user can see Feature Words of all categories more intuitively, further, it is also possible to be not
Know that the sight spot of sight spot type carries out type prediction, the maximum probability of which classification belonging to this sight spot can be calculated;Individual character
Change and recommend the page, the method display recommendation results according to commending contents can be selected, it is also possible to according to mixing recommendation method
Display recommendation results, in addition, illustrates the graph of a relation between user always according to user's similarity size, and by phase
This user is recommended like spending the highest top ten user.
Accompanying drawing explanation
Fig. 1 is personalized tourism commending system flow chart based on Hadoop
Fig. 2 is Mapreduce workflow diagram in Hadoop
Fig. 3 is that in Hadoop, Mapreduce adds up word frequency flow chart
Detailed description of the invention
One, based on mixing the thought of proposed algorithm and realizing process sample
1) thought based on mixing proposed algorithm:
1. preparing raw data list, content includes that ID, user class, sight spot ID, sight spot rank, user are to scape
The scoring of point;
2. setting up the homologous factors of scene data, statistics occurs the number of times once simultaneously occurred with other sight spots the most respectively;
3. set up the similarity matrix of scene data, try to achieve sight spot similarity according to sight spot co-user;
4. setting up user's weighted scoring matrix to sight spot, this score value is by original scoring, user class and the number of sight spot rank
Get according to weighting;
5. similarity matrix calculates recommendation results score value with weighted scoring matrix multiple;
6. by result score value by sorting from big to small, get rid of user and gone to sight spot, recommend by score height.
2) sample based on mixing proposed algorithm thought realizes process:
1. preparing raw data list, content includes that ID, user class, sight spot ID, sight spot rank, user are to scape
The scoring of point.Initial data sample is listed as follows:
UserID | UserLevel | SceneID | Score | SceneLevel |
User1 | 5 | Scene1 | 5 | 5A |
User1 | 5 | Scene2 | 3 | 4A |
User1 | 5 | Scene4 | 2.5 | 2A |
User2 | 4 | Scene1 | 4 | 5A |
User2 | 4 | Scene3 | 4 | 3A |
User2 | 4 | Scene4 | 3 | 2A |
User3 | 3 | Scene2 | 4.5 | 4A |
User3 | 3 | Scene3 | 4.5 | 3A |
User3 | 3 | Scene4 | 3.5 | 2A |
User3 | 3 | Scene5 | 4 | 1A |
2. setting up the homologous factors of scene data, statistics occurs the number of times once simultaneously occurred with other sight spots the most respectively, with
User divides for unit, counts according to the sight spot that each user is evaluated, and calculates sight spot respectively independent
The number of times occurred and the number of times jointly occurred with other sight spots.Scene data homologous factors sample is as follows:
3. set up the similarity matrix of scene data, try to achieve sight spot similarity according to sight spot co-user;
The calculating formula of similarity of article i and article j:
Wherein, N (i) represents the number of users removing sight spot i, and N (j) represents the number of users removing sight spot j, and molecule represents simultaneously
Removing sight spot i and the number of users of sight spot j, the calculating of denominator is in order to avoid hot spot and other sight spot similarities
Close to the problem of 1, therefore hot spot is carried out fall heat treatment.Scene data similarity matrix sample is as follows:
Scene1 | Scene2 | Scene3 | Scene4 | Scene5 | |
Scene1 | 1 | 0.5 | 0.5 | 0.816 | 0 |
Scene2 | 0.5 | 1 | 0.5 | 0.816 | 0.707 |
Scene3 | 0.5 | 0.5 | 1 | 0.816 | 0.707 |
Scene4 | 0.816 | 0.816 | 0.816 | 1 | 0.447 |
Scene5 | 0 | 0.707 | 0.707 | 0.447 | 1 |
4. setting up user's weighted scoring matrix to sight spot, this scoring is made up of three parts, and a part is that user is to this sight spot
Directly scoring, another part is to have weighted the rank at sight spot to divide and divide with user class.
Score=w1 × original_score+w2 × (scene_level+1)+w3 × user_level (1)
W1+w2+w3=1
Weight calculation is to get according to number statistical contained by each index.S6-S1 represents the scoring total number of persons that 5-0 divides respectively;
SL6-SL1 represents 5A-0A sight spot at different levels sum respectively;UL6-UL1 represents each section of user class respectively
Total number of persons, wherein UL6 represents the number of more than 15 grades, and UL5 represents the number of 13-15 level, and UL4 represents
The number of 10-12 level, UL3 represents 7-9 level number, and UL2 represents 4-6 level number, UL1 represent 1-3 level with
Upper number.
Index | Weight | 6 | 5 | 4 | 3 | 2 | 1 |
Former scoring | w1 | S6 | S5 | S4 | S3 | S2 | S1 |
Sight spot rank | w2 | SL6 | SL5 | SL4 | SL3 | SL2 | SL0 |
User class | w3 | UL6 | UL5 | UL4 | UL3 | UL2 | UL1 |
Proportion computing formula shared by each score value is as follows:
Weight calculation formula shared by each index is as follows:
Through backstage to 50,000 users, 10,000 sight spots calculate, w1=0.72, w2=0.17, w3=1-w1-
W2=0.11
By result computed above, bring scoring formula (1) into, draw each user weighted scoring to each sight spot,
And this scoring is converted into matrix format, user sight spot rating matrix sample is as follows:
User1 | User2 | User3 | |
Scene1 | 5 | 4.3 | 1.5 |
Scene2 | 3.3 | 1.2 | 4.35 |
Scene3 | 0.9 | 0.9 | 4.05 |
Scene4 | 2.35 | 3.4 | 3.05 |
Scene5 | 0.3 | 2.4 | 3.1 |
5. similarity matrix calculates recommendation results score value with weighted scoring matrix multiple, and as a example by user 1, result of calculation is as follows;
6. score is sorted from high to low, get rid of the sight spot that user 1 had gone, remaining sight spot is pushed away by score height
Recommend.
Recommendation results: Scene3, it is recommended that reason: Scene3 and the Scene4 similarity gone higher, and user
Higher to Scene3 interest-degree.
Two, the similarity thought between two users and sample process:
1) the similarity thought between two users is calculated
1. prepare raw data list;
2. calculate the similarity between two users, be mainly made up of four part Similarity-Weighteds, form similarity matrix;
3. for certain user, it is sorted from high to low with other user's similarity score, and by former masterpieces higher for score
Recommend for similar users.
2) Similarity Measure process sample between two users
1. preparing raw data list, content includes ID, user class, sight spot ID, sight spot rank, scene types.
Initial data sample is as follows:
2. calculating the similarity between two users, be mainly made up of four partial weightings, i.e. two users went the grade at sight spot
Similarity, went the type similarity at sight spot, and whether had identical sight spot, the most also to weight two
The grade of user.We are added four part scores according to certain weight, and finally draw between the two is similar
Degree, it is possible to use following formula represents:
Similarity=w1 × sim_category+w2 × sim_sceneLevel+w3 × sim_userLevel+w4 × sim_scene
Wherein w1+w2+w3=1, w1, w2, w3 calculation is similar to above-mentioned rating matrix Computational Methods,
What sim_category represented is the similarity of sight spot type, and what sim_sceneLevel represented is the phase of sight spot rank
Like degree, what sim_userLevel represented is the similarity of user class, and what sim_scene represented is between two users
Whether there is the similarity that common sight spot is weighed.
Wherein xi, yi represent that when seeking sim_categoy x user and y user removed the probability of all categories at sight spot, and xi, yi exist
Representing when seeking sim_sceneLevel that x user and y user removed each grade probability at sight spot, xi, yi are asking
The user gradation of x user and y user is represented respectively during sim_userLevel
And when calculating sim_scene, use and whether gone to identical sight spot to weigh this similarity, method used is
Calculating the ratio between common factor number and the maximum number at the gone sight spot of two people at two the gone sight spots of people, do so can be
The value of similarity specifies between 0-1, and can also weigh out similarity between the two well.Should be noted that
Be owing to some user has repeatedly gone to identical sight spot, we calculate similarity when to the sight spot needs gone
Do duplicate removal to process.
It is computed, show in above-mentioned sample that the similarity matrix between user is as follows:
User1 | User2 | User3 | |
User1 | 1 | 0.641 | 0.598 |
User2 | 0.641 | 1 | 0.613 |
User3 | 0.598 | 0.613 | 1 |
3. by similarity score according to sorting from high to low, and to the higher top of this user's recommendation scores as similar use
Family
The similar users order recommended for user User1 is: User2, User3
The similar users order recommended for user User2 is: User1, User3
The similar users order recommended for user User3 is: User2, User1
Three, understand MapReduce framework workflow and apply sample:
1) MapReduce framework workflow
Step 1: client provides a mapreduce operation to host node;
Mapreduce is inputted data and is divided into isometric small data block by step 2:Hadoop, is referred to as inputting burst,
And be that each burst builds a map task, map task and reduce task are distributed to simultaneously
On different working nodes;
Step 3: each burst place working node executed in parallel map task, the result after performing is ranked up,
As the input data of reduce task, it is copied to perform the working node of reduce task.
The result of map task is carried out conformity calculation by the working node of step 4:reduce task, and by last calculating
Result, as output, is written in output file.
MapReduce framework workflow diagram is as shown in Figure 2.
2) MapReduce framework adds up word frequency sample in the present system
Initial data in mapreduce operation is split into isometric burst by step 1:Hadoop, and is distributed by burst
On different map task working nodes;
Step 2: each burst place working node executed in parallel map task, includes participle here, goes to stop word and counting;
Step 3: as the input of reduce after count results being sorted using the form of<key:value>, is imparted to perform
The working node of reduce task;
Step 4:reduce working node performs to merge and statistical work, finally result output is preserved.
MapReduce framework adds up word frequency sample flow chart as shown in Figure 3 in the present system.
Four, UI interface is embodied as content
After correctly filling in " user name ", " password ", clicking on " login " button, backstage can be by the note in customer data base
Record verification information is the most correct, as correctly, gets final product login system.When new user registered by needs, click on " registration " and press
Button, after ejecting the interface of registration, typing relevant information successively, clicks on " determination " button and can realize the note of new user
Volume, the user profile of new registration can be stored in customer data base, then uses the user of new registration at login interface
Realize system login.
After entering into system, the public function of all pages is sight spot similarity comparison inquiry and search sight spot.Work as needs
When inquiring about the similarity degree at two sight spots, in sight spot similarity comparison one hurdle, input two needs the scapes of contrast respectively
Point title, backstage can call sight spot similarity comparison formula and calculate, according to the eigenvalue at sight spot with went the two
The co-user at sight spot, weighted calculation Euclidean distance formula, inverted to the value obtained, it is designated as two sight spots
Similarity, this formula represents that both distances are the biggest, and similarity is the least.Click on equal sign, the phase at the two sight spot can be drawn
Like angle value and analog result.Function of search, i.e. inputs sight name at search column, clicks on search, and backstage can be from sight spot
Data base inquires about this sight spot information, and partial information is fed back to the page.
1. the RECOMENDATION page mainly includes four column contents: hot spot is recommended, whole nation each province tourist arrivals statistics,
Whole nation each province tourism optimum season statistics, hot spot describe.
Hot spot is recommended: this column content is to draw according to big data statistics, each by calculating in scene data storehouse
The reception number at sight spot, and weight visitor's scoring to this sight spot, comprehensively draws the sight spot of ten before ranking, gives
Recommend to show.
Whole nation each province tourist arrivals statistics: i.e. map column, is used for adding up the whole of last year whole nation each province reception visitor's quantity
Number, on map can by shade display each province reception number, color is the deepest, represent go this
The people saving tourism is the most;Color is the most shallow, represents and goes the fewer in number of this province.This province of mouse-over, can show phase
The concrete number answered.This column purpose is to provide impression intuitively for traveller, it is thus understood that go in the whole of last year
The people which saves tourism is more or less.
Whole nation each province tourism optimum season statistics: for adding up the tourism month that whole nation each province is optimal, transverse axis represents each province
Part, the longitudinal axis represented for 12 month, when mouse-over province, can show this province title and optimal month.
This column purpose is to combine current season, selects to be best suitable for province for consumer and goes on a tour and provide help.
Hot spot describes: according to big data statistics, shows and goes sight-seeing front ten sight spots that number is most in all sight spots,
Recommend as hot spot.Ten sight name show under the hot spot of left side, the sight spot on the right side of the page
Describing a hurdle and will show the specifying information at sight spot, wherein one page is that a sight spot describes, and particular content includes sight spot
Title, sight spot type, sight spot rank, address, sight spot and sight spot brief introduction.Page-turning function, can check the next one
The introduction of hot spot, or click directly on hot spot title, it is possible to translate into corresponding sight spot lobby page.Right side
The function of search of top, can carry out distribution inquiry in data base for input sight spot, and be shown by corresponding informance
In sight spot describes.
2. classification recommends interface mainly to include four columns: recommending scenery spot of all categories, whole nation each province tourist arrivals add up, respectively
Classification word cloud is shown, hot spot of all categories describes.
Recommending scenery spot of all categories: this plate lists 27 class sight spot typonyms, these 27 classifications are also according to sight spot
Big data are added up, when each item name of mouse-over, ten heat shown under the category can be extended
Door sight spot, clicks on concrete sight name, can show concrete sight spot information in the description bar of sight spot, right side.
The whole nation each province tourist arrivals statistics: for add up the whole of last year whole nation each province reception visitor's quantity number, on ground
Can be by the number of shade display each province's reception on figure, color is the deepest, represents the people going to this province to travel more
Many;Color is the most shallow, represents and goes the fewer in number of this province.This province of mouse-over, can show corresponding concrete number.
This column purpose is to provide impression intuitively for traveller, it is thus understood which goes save the people of tourism in the whole of last year
More or less.
Word cloud of all categories is shown: for showing the Feature Words information of each classification, visitor can be made to see intuitively each
What class another characteristic is, and during each word of mouse-over, can show the frequency that this word is added up.These words are
By the sight spot of each classification being described in detail the result obtained after information carries out text-processing, first by each class
Other all sight spots describe comprehensive, obtain big length lteral data, then use MR Computational frame, by this number
According to carrying out word segmentation processing, on the basis of segmentation methods, introduce tourist attractions dictionary here, to avoid one
Proprietary sight name splits into multiple vocabulary.Need after participle word segmentation result is processed, including going to stop word, go
Symbol, removes English etc., even if going symbol to remove the blank in punctuation mark and statement, removing English is exactly literary composition
The English occurred in chapter all removes, and goes to stop word and i.e. removes stop word, such as auxiliary word, verb etc., and this kind of word is not
Last statistics listed in by needs, so to carry out stopping word step, this stops dictionary firstly the need of setting up one, will not
The word needed all puts in, and then in word segmentation result, searching loop stops dictionary, progressively will occur in word segmentation result
The word that stops delete.This plate, in addition to can showing word cloud, also has the function predicting geopark, i.e. by number
Use sorting algorithm according to digging technology, existing big data are trained, the type at unknown sight spot can be predicted.
Classification hot spot describes: according to big data statistics, shows in each classification that all sight spots visit number is
Many front ten sight spots, recommend as classification hot spot.The name of each classification is under the type of sight spot, left side
Display, each item name of mouse-over, front ten hot spot titles of the category can be shown below classification,
One of them sight name of click, the sight spot on the right side of the page describes a hurdle and will show the specifying information at sight spot,
Wherein one page is that a sight spot describes, and particular content includes sight name, sight spot type, sight spot rank, sight spot
Address and sight spot brief introduction.Page-turning function, can check the introduction of generic next hot spot, or directly point
Hit hot spot title, it is possible to translate into corresponding sight spot lobby page.Function of search above You Ce, can be for defeated
Enter sight spot in data base, carry out distribution inquiry, and corresponding informance is illustrated in the description of sight spot.
3. personalized recommendation interface mainly includes four columns: personalized recommendation sight spot, whole nation each province tourist arrivals statistics,
Customer relationship network, recommendation sight spot describe.
Personalized recommendation sight spot: after user logs in, this plate lists ten sight name, and this title is by based on thing
The collaborative filtering of product, in conjunction with content-based recommendation algorithm, the tourism information summary for individual calculates
Come.First add up the tourist attractions of individual, and the score data to this sight spot, form individual behavior matrix,
Then by big data platform, all of user's sight spot information is calculated homologous factors, and this matrix is closed
And process, finally by homologous factors and individual behavior matrix multiple, obtain this user and the weighting at all sight spots is divided
Value, recommends before highest scoring ten as personalized recommendation result.Wherein dividing in individual behavior matrix
Value is that user has weighted sight spot similarity and sight spot property value to the scoring at sight spot, and sight spot similarity can be understood as
Like the user having how many ratios in the user of sight spot i also to like sight spot j, in order to avoid hot spot occurs, dig
Pick long-tail sight spot, the method that have employed the weight having punished sight spot on the formula calculating similarity, therefore alleviate
The probability that hot spot is with a lot of sight spots the most similar.
The whole nation each province tourist arrivals statistics: for add up the whole of last year whole nation each province reception visitor's quantity number, on ground
Can be by the number of shade display each province's reception on figure, color is the deepest, represents the people going to this province to travel more
Many;Color is the most shallow, represents and goes the fewer in number of this province.This column purpose is to provide for traveller to print intuitively
As, it is thus understood which goes save the people traveled in the whole of last year more or less.
Customer relationship network: for representing the close relation degree of all visitors, the purpose of this column is to tie
Make more friend with a common goal.This chart is to divide closely according to the similarity size between visitor and visitor
Degree, similarity size calculates based on common interest hobby between user, the sight spot i.e. gone according to user
Same or similar statistics it can be understood as went user A and B of sight spot i the most also to remove sight spot j, time
Going through customer data base, use Euclidean distance formula, calculate distance between the two, distance is the biggest, similar
Spending the lowest, distance is the least, and similarity is the biggest.When each of mouse-over, can show similar to this user
Degree size, clicks on this user, can check which sight spot this user went to, and clicks on and recommends sight spot to press by similar users
Button, the sight spot also can gone according to the user that similarity is the highest, carry out front ten recommendations.
Personalized recommendation sight spot describes: calculate according to proposed algorithm, by ten sight spots before highest scoring in recommendation results,
Recommend as personalized sight spot.Ten sight name show under personalized recommendation sight spot, left side, and the page is right
The sight spot of side describes a hurdle and will show the specifying information at sight spot, and wherein one page is that a sight spot describes, particular content
Including sight name, sight spot type, sight spot rank, address, sight spot and sight spot brief introduction.Page-turning function, can look into
See the introduction of next hot spot, or click directly on hot spot title, it is possible to translate into corresponding sight spot and introduce
Page.Function of search above You Ce, can carry out distribution inquiry for input sight spot in data base, and by correspondence
Information is illustrated in the description of sight spot.
Claims (5)
1. personalized tourism commending system based on Hadoop, it is characterized in that: this system is with Eclipse as developing instrument, Hadoop is big data processing platform (DPP), Java is programming language, connect local Windows system and server CentOS system by JSCH is cross-platform, i.e. may be implemented in the corresponding operating request that sends on server on local browser;By the interactive information of the page, backstage uses the MapReduce Computational frame in Hadoop, carries out substep and search and calculate in distributed file system, and result is integrated return front end page;
Native system has five modules and has complemented each other whole system function, and they are webcrawler module, data module, big data processing module respectively, recommend computing module, UI interface module;Their annexation is, webcrawler module is unidirectional with meta data block to be connected, and simultaneously the most unidirectional with UI interface module is connected;Data module is unidirectional with big data processing module to be connected, and is bi-directionally connected with UI interface module simultaneously;Big data processing module with recommend that computing module is unidirectional to be connected, while be bi-directionally connected with UI interface module;Computing module is recommended to be bi-directionally connected with UI interface module;The concrete connection procedure of each module is as follows,
1. webcrawler module mainly crawls sight spot information and user profile data, the order that crawls of sight spot information is to crawl successively according to province and urban information, first each province urban information data in ergodic data module, backstage is by the city name of URL in amendment tour site, obtain the Cookie of this website simultaneously, obtain the sight name list under each city, each province successively, further according to this attraction list, successively the relevant field information needed for each sight spot is extracted, and record and store in sight spot information table corresponding in data base;The information data of user is that the review pages according to each sight spot obtains the information commenting on this sight spot, and obtain the commentator i.e. details of user according to review information, user profile and review information are recorded respectively and stores in user message table corresponding in data and evaluation table;Crawl flow process as follows:
List of countries → province list → city list → attraction list → sight spot field information → sight spot comment → commentator's information
Webcrawler module mainly triggers, by two approach, the program of crawling, one be every day right place read scene data to data base, and trigger and crawl sight spot and user profile program accordingly, and result record is stored in the corresponding data table in data base;Another is to be triggered by the search function of UI interface module, when the sight name inquired about can not find corresponding result in data base, crawlers will be touched go tour site to inquire about and crawl relevant information, if finding corresponding sight spot, then by the relevant field information crawler at this sight spot out, and record and store sight spot information table corresponding in data base, result is fed back to UI page correspondence position again simultaneously.
2. data module is mainly used to store master data information, other including three major types, is sight spot master data, user's master data, user sight spot relation data respectively;Wherein sight spot master data comprises province list, city list, attraction list, each sight spot Basic Information Table;The sight spot information that user's master data comprises user basic information and user went;User sight spot relation data comprises user's evaluating data to sight spot;On the one hand data module provides the data supporting on basis for big data processing module, on the other hand can therefrom inquire about information needed by UI interface search function.
3. big data processing module is that MapReduce Computational frame based on Hadoop platform runs, this framework is broadly divided into two parts of Map and Reduce, first each working node performing map task it is distributed to by host node after being split by initial data, each working node starts simultaneously at execution map task, after map task terminates, the result input value as reduce task will be exported, send the working node performing reduce task to, reduce is responsible for merging the result of map statistical disposition, and final result is integrated output;
The purpose of this module is to improve data processing speed, can pass through the similarity search function at UI interface, classification sight spot word cloud function, scene types forecast function trigger and call the data in data base and carry out processing and calculate, and result of calculation returns to the correspondence position of the UI page;Calculating mainly for following four aspect content parallel processing, one is that the process of web crawlers have employed MR thought, is effectively improved and crawls speed;Two is to be applied to the calculating aspect to user's similarity and sight spot similarity, it is achieved at short notice user or sight spot are completed Similarity Measure, has used the user's master data in data module, sight spot master data and user sight spot relation data among these;Three is to be applied to text mining process, respectively each classification sight spot information is done participle statistics, and show corresponding dynamic class word cloud design sketch, class prediction calculating has been done at the sight spot to UNKNOWN TYPE in addition, mainly uses sight spot master data as classification based training.
4. recommending computing module is that the result data according to big data processing module carries out specific aim recommendation calculating, and recommendation results is fed back in UI page correspondence field;This module has three big content recommendations, one is to recommend similar users for login user, the user's similarity i.e. calculated according to big data processing module sets up the similarity matrix of user and user, finds the top ten list user the highest with this user's similarity as recommendation results;Two is content-based recommendation, according to the sight spot gone before user, analyzes and extract these sight spot features, as the hobby of user, finds similar sight spot in the similarity matrix of sight spot;Three is that mixing is recommended, namely personalized recommendation, and it has merged based on content recommendation method and collaborative filtering recommending method based on article, has improved and recommend accuracy, provide the user the recommendation results being more suitable for;Mixing recommendation method is first to form sight spot homologous factors according to user sight spot relation data, secondly sight spot content characteristic is weighted in the scoring of particular user sight spot, form user behavior matrix, homologous factors is multiplied with behavioural matrix the score value obtaining this user to all sight spots, take the top ten that score value is the highest, be presented to user as consequently recommended result.
5.UI interface module is the most relevant with above-mentioned 4 modules, in addition between webcrawler module being unidirectional triggering relation, it is all bi-directional association with other three modules, on the one hand triggering the program on backstage in each relating module by page corresponding function, result of calculation is fed back to the corresponding field of the page and shows by the most each module;
UI interface module mainly has the three big pages, is the RECOMENDATION page respectively, and classification recommends the page and the personalized recommendation page;The function having at each page is to carry out sight spot content retrieval, and i.e. one sight name of input, can show the essential information at this sight spot;Also have sight spot Similarity value inquiry, i.e. two sight name of input, help user to find Similarity value and the analog result at the two sight spot;Hot spot recommends the page to be according to the selection at 100 general-purpose families in data base and evaluation information comprehensive statistics, at this page, it will show sight spot and the sight spot information of Top10, and the tourist arrivals of national each province and the optimal travelling season of each province;What classification recommended page presentation is that each classification sight spot statistics obtains front ten sight spots that the category is the most popular, all sight spots are always divided into 27 classifications, text data digging has been done respectively for each classification, extract the key feature word of each classification, and its frequency of occurrences of pin is made that corresponding word cloud to each classification, make user can see Feature Words of all categories more intuitively, in addition, type prediction can also be carried out for the sight spot of unknown sight spot type, the maximum probability of which classification belonging to this sight spot can be calculated;The personalized recommendation page, the method display recommendation results according to commending contents can be selected, it is also possible to according to mixing recommendation method display recommendation results, in addition, illustrate the graph of a relation between user always according to user's similarity size, and top ten user the highest for similarity is recommended this user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610258743.8A CN105930469A (en) | 2016-04-23 | 2016-04-23 | Hadoop-based individualized tourism recommendation system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610258743.8A CN105930469A (en) | 2016-04-23 | 2016-04-23 | Hadoop-based individualized tourism recommendation system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105930469A true CN105930469A (en) | 2016-09-07 |
Family
ID=56837157
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610258743.8A Pending CN105930469A (en) | 2016-04-23 | 2016-04-23 | Hadoop-based individualized tourism recommendation system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105930469A (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106547919A (en) * | 2016-12-06 | 2017-03-29 | 广东工业大学 | A kind of distributed recommendation method of massive digital information |
CN107133279A (en) * | 2017-04-13 | 2017-09-05 | 西安电子科技大学 | A kind of intelligent recommendation method and system based on cloud computing |
CN107273412A (en) * | 2017-05-04 | 2017-10-20 | 北京拓尔思信息技术股份有限公司 | A kind of clustering method of text data, device and system |
CN107527303A (en) * | 2017-07-20 | 2017-12-29 | 中国农业大学 | A kind of rural tourism visualized recommendation method and system |
CN107742264A (en) * | 2017-09-06 | 2018-02-27 | 武汉市悠卡互联科技有限公司 | Monitoring-information method for release management and system based on real-time update database |
CN108763515A (en) * | 2018-05-31 | 2018-11-06 | 天津理工大学 | A kind of time-sensitive personalized recommendation method decomposed based on probability matrix |
CN109002549A (en) * | 2018-07-31 | 2018-12-14 | 国政通科技有限公司 | A kind of method and device for precisely hitting high-end tourism potential user |
CN109033355A (en) * | 2018-07-25 | 2018-12-18 | 北京易观智库网络科技有限公司 | Carry out the method, apparatus and storage medium of funnel analysis |
CN109166006A (en) * | 2018-08-17 | 2019-01-08 | 苏州诚满信息技术有限公司 | A kind of intelligent shopping guide method and its system for electronic bill |
CN109284443A (en) * | 2018-11-28 | 2019-01-29 | 四川亨通网智科技有限公司 | A kind of tourism recommended method and system based on crawler technology |
CN109359287A (en) * | 2018-07-12 | 2019-02-19 | 福州大学 | The online recommender system of interactive cultural tour scenic area and scenic spot and method |
CN110209927A (en) * | 2019-04-25 | 2019-09-06 | 北京三快在线科技有限公司 | Personalized recommendation method, device, electronic equipment and readable storage medium storing program for executing |
CN110263256A (en) * | 2019-06-21 | 2019-09-20 | 西安电子科技大学 | Personalized recommendation method based on multi-modal heterogeneous information |
CN111915382A (en) * | 2019-05-08 | 2020-11-10 | 阿里巴巴集团控股有限公司 | Data processing method, system and device |
CN112257517A (en) * | 2020-09-30 | 2021-01-22 | 中国地质大学(武汉) | Scenic spot recommendation system based on scenic spot clustering and group emotion recognition |
CN112527682A (en) * | 2020-12-24 | 2021-03-19 | 四川享宇金信金融科技有限公司 | Model development method for fusing product functionality and user harmony |
CN113111266A (en) * | 2021-04-28 | 2021-07-13 | 前海七剑科技(深圳)有限公司 | Destination recommendation method and device and computer-readable storage medium |
CN117349535A (en) * | 2023-12-04 | 2024-01-05 | 四川启明芯智能科技有限公司 | Cross-platform multi-business comprehensive travel management system and method |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102789462A (en) * | 2011-05-18 | 2012-11-21 | 阿里巴巴集团控股有限公司 | Project recommendation method and system |
CN102968465A (en) * | 2012-11-09 | 2013-03-13 | 同济大学 | Network information service platform and search service method based on network information service platform |
CN103218390A (en) * | 2012-12-31 | 2013-07-24 | 百度在线网络技术(北京)有限公司 | Site resource management method and device |
CN103237291A (en) * | 2013-05-10 | 2013-08-07 | 阿坝师范高等专科学校 | Integrated positioning method for mobile terminal and active information service recommendation method |
US20160070769A1 (en) * | 2012-11-30 | 2016-03-10 | Orbis Technologies, Inc. | Ontology harmonization and mediation systems and methods |
US20160071212A1 (en) * | 2014-09-09 | 2016-03-10 | Perry H. Beaumont | Structured and unstructured data processing method to create and implement investment strategies |
US20160092544A1 (en) * | 2014-09-26 | 2016-03-31 | Oracle International Corporation | System and method for generating rowid range-based splits in a massively parallel or distributed database environment |
US20160103877A1 (en) * | 2014-10-10 | 2016-04-14 | International Business Machines Corporation | Joining data across a parallel database and a distributed processing system |
-
2016
- 2016-04-23 CN CN201610258743.8A patent/CN105930469A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102789462A (en) * | 2011-05-18 | 2012-11-21 | 阿里巴巴集团控股有限公司 | Project recommendation method and system |
CN102968465A (en) * | 2012-11-09 | 2013-03-13 | 同济大学 | Network information service platform and search service method based on network information service platform |
US20160070769A1 (en) * | 2012-11-30 | 2016-03-10 | Orbis Technologies, Inc. | Ontology harmonization and mediation systems and methods |
CN103218390A (en) * | 2012-12-31 | 2013-07-24 | 百度在线网络技术(北京)有限公司 | Site resource management method and device |
CN103237291A (en) * | 2013-05-10 | 2013-08-07 | 阿坝师范高等专科学校 | Integrated positioning method for mobile terminal and active information service recommendation method |
US20160071212A1 (en) * | 2014-09-09 | 2016-03-10 | Perry H. Beaumont | Structured and unstructured data processing method to create and implement investment strategies |
US20160092544A1 (en) * | 2014-09-26 | 2016-03-31 | Oracle International Corporation | System and method for generating rowid range-based splits in a massively parallel or distributed database environment |
US20160103877A1 (en) * | 2014-10-10 | 2016-04-14 | International Business Machines Corporation | Joining data across a parallel database and a distributed processing system |
Non-Patent Citations (3)
Title |
---|
石静: "基于混合模式的个性化推荐系统的应用研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
谢欢: "大数据挖掘中的并行算法研究及应用", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
马腾腾,等: "基于Hadoop的旅游景点推荐的算法实现与应用", 《计算机技术与发展》 * |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106547919B (en) * | 2016-12-06 | 2018-07-24 | 广东工业大学 | A kind of distributed recommendation method of massive digital information |
CN106547919A (en) * | 2016-12-06 | 2017-03-29 | 广东工业大学 | A kind of distributed recommendation method of massive digital information |
CN107133279A (en) * | 2017-04-13 | 2017-09-05 | 西安电子科技大学 | A kind of intelligent recommendation method and system based on cloud computing |
CN107273412B (en) * | 2017-05-04 | 2019-09-27 | 北京拓尔思信息技术股份有限公司 | A kind of clustering method of text data, device and system |
CN107273412A (en) * | 2017-05-04 | 2017-10-20 | 北京拓尔思信息技术股份有限公司 | A kind of clustering method of text data, device and system |
CN107527303A (en) * | 2017-07-20 | 2017-12-29 | 中国农业大学 | A kind of rural tourism visualized recommendation method and system |
CN107742264A (en) * | 2017-09-06 | 2018-02-27 | 武汉市悠卡互联科技有限公司 | Monitoring-information method for release management and system based on real-time update database |
CN108763515A (en) * | 2018-05-31 | 2018-11-06 | 天津理工大学 | A kind of time-sensitive personalized recommendation method decomposed based on probability matrix |
CN108763515B (en) * | 2018-05-31 | 2021-12-17 | 天津理工大学 | Time-sensitive personalized recommendation method based on probability matrix decomposition |
CN109359287A (en) * | 2018-07-12 | 2019-02-19 | 福州大学 | The online recommender system of interactive cultural tour scenic area and scenic spot and method |
CN109033355A (en) * | 2018-07-25 | 2018-12-18 | 北京易观智库网络科技有限公司 | Carry out the method, apparatus and storage medium of funnel analysis |
CN109002549A (en) * | 2018-07-31 | 2018-12-14 | 国政通科技有限公司 | A kind of method and device for precisely hitting high-end tourism potential user |
CN109166006B (en) * | 2018-08-17 | 2021-05-18 | 浙江力石科技股份有限公司 | Intelligent shopping guide method and system for electronic ticketing |
CN109166006A (en) * | 2018-08-17 | 2019-01-08 | 苏州诚满信息技术有限公司 | A kind of intelligent shopping guide method and its system for electronic bill |
CN109284443A (en) * | 2018-11-28 | 2019-01-29 | 四川亨通网智科技有限公司 | A kind of tourism recommended method and system based on crawler technology |
CN110209927A (en) * | 2019-04-25 | 2019-09-06 | 北京三快在线科技有限公司 | Personalized recommendation method, device, electronic equipment and readable storage medium storing program for executing |
CN111915382A (en) * | 2019-05-08 | 2020-11-10 | 阿里巴巴集团控股有限公司 | Data processing method, system and device |
CN110263256A (en) * | 2019-06-21 | 2019-09-20 | 西安电子科技大学 | Personalized recommendation method based on multi-modal heterogeneous information |
CN110263256B (en) * | 2019-06-21 | 2022-12-02 | 西安电子科技大学 | Personalized recommendation method based on multi-mode heterogeneous information |
CN112257517A (en) * | 2020-09-30 | 2021-01-22 | 中国地质大学(武汉) | Scenic spot recommendation system based on scenic spot clustering and group emotion recognition |
CN112527682A (en) * | 2020-12-24 | 2021-03-19 | 四川享宇金信金融科技有限公司 | Model development method for fusing product functionality and user harmony |
CN112527682B (en) * | 2020-12-24 | 2023-10-27 | 四川享宇金信金融科技有限公司 | Model development method for fusing product functionality and user coordination |
CN113111266A (en) * | 2021-04-28 | 2021-07-13 | 前海七剑科技(深圳)有限公司 | Destination recommendation method and device and computer-readable storage medium |
CN117349535A (en) * | 2023-12-04 | 2024-01-05 | 四川启明芯智能科技有限公司 | Cross-platform multi-business comprehensive travel management system and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105930469A (en) | Hadoop-based individualized tourism recommendation system and method | |
CN103226578B (en) | Towards the website identification of medical domain and the method for webpage disaggregated classification | |
CN103294781B (en) | A kind of method and apparatus for processing page data | |
CN108280114B (en) | Deep learning-based user literature reading interest analysis method | |
US8645385B2 (en) | System and method for automating categorization and aggregation of content from network sites | |
JP5721818B2 (en) | Use of model information group in search | |
CN111191122A (en) | Learning resource recommendation system based on user portrait | |
CN106339502A (en) | Modeling recommendation method based on user behavior data fragmentation cluster | |
CN106095949A (en) | A kind of digital library's resource individuation recommendation method recommended based on mixing and system | |
CN104933239A (en) | Hybrid model based personalized position information recommendation system and realization method therefor | |
CN102411754A (en) | Personalized recommendation method based on commodity property entropy | |
CN104866554B (en) | A kind of individuation search method and system based on socialization mark | |
CN103186550A (en) | Method and system for generating video-related video list | |
CN105426514A (en) | Personalized mobile APP recommendation method | |
CN110597981A (en) | Network news summary system for automatically generating summary by adopting multiple strategies | |
CN103838756A (en) | Method and device for determining pushed information | |
CN103955529A (en) | Internet information searching and aggregating presentation method | |
US10264082B2 (en) | Method of producing browsing attributes of users, and non-transitory computer-readable storage medium | |
CN105426529A (en) | Image retrieval method and system based on user search intention positioning | |
CN104484431A (en) | Multi-source individualized news webpage recommending method based on field body | |
CN104268148A (en) | Forum page information auto-extraction method and system based on time strings | |
CN104199938B (en) | Agricultural land method for sending information and system based on RSS | |
CN105389329A (en) | Open source software recommendation method based on group comments | |
CN111506831A (en) | Collaborative filtering recommendation module and method, electronic device and storage medium | |
CN104077407A (en) | System and method for intelligent data searching |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160907 |