CN113313307A - Tour route mining method based on signaling big data - Google Patents

Tour route mining method based on signaling big data Download PDF

Info

Publication number
CN113313307A
CN113313307A CN202110597996.9A CN202110597996A CN113313307A CN 113313307 A CN113313307 A CN 113313307A CN 202110597996 A CN202110597996 A CN 202110597996A CN 113313307 A CN113313307 A CN 113313307A
Authority
CN
China
Prior art keywords
scenic spot
mobile phone
base station
scenic
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110597996.9A
Other languages
Chinese (zh)
Inventor
刘旭东
杨逸凡
叶强
刘小煜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202110597996.9A priority Critical patent/CN113313307A/en
Publication of CN113313307A publication Critical patent/CN113313307A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • G06Q10/047Optimisation of routes or paths, e.g. travelling salesman problem
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/14Travel agencies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/024Guidance services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/20Services signaling; Auxiliary data signalling, i.e. transmitting data via a non-traffic channel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/35Services specially adapted for particular environments, situations or purposes for the management of goods or merchandise

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Tourism & Hospitality (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a tour route mining method based on signaling big data, which comprises the following steps: acquiring original mobile phone signaling data of a preset area, and removing noise data generated by a ping-pong effect in the data to obtain a global mobile phone signaling data table; crawling scenic spot POI data in a preset area, screening base stations in the scenic spot area, constructing a base station-scenic spot table, and accordingly removing non-scenic spot base station connection data in the global mobile phone signaling data table to obtain a scenic spot mobile phone signaling data table; removing interference data generated by passers-by, scenic spot workers and resident residents nearby to obtain a scenic spot visitor mobile phone signaling data table; constructing a scenic spot sequence according to a time sequence for a mobile phone signaling data table of the scenic spot tourist, and performing aggregation calculation to obtain a preliminary tourist visit route table; and merging adjacent scenic spots in the primary tourist browsing route by adopting a hierarchical clustering method based on Euclidean distance to obtain an optimal tourist route list. The method has high stability in the digging process, and the dug tour route is more accurate.

Description

Tour route mining method based on signaling big data
Technical Field
The invention relates to the technical field of intelligent travel, in particular to a tour route mining method based on signaling big data.
Background
With the social development and the improvement of the living standard of people, tourism becomes one of more and more important life styles. However, the traditional tour route planning and scenic spot management have various problems, such as incapability of sharing data, low data utilization rate, incapability of accurately mastering scenic spot pedestrian flow information, and a series of problems of scenic spot overload operation, traffic congestion, low tourist satisfaction and the like. Therefore, accurate control of hot tourist routes is a problem to be solved urgently. With the development of the 'smart earth' by IBM, the research and application enthusiasm of smart cities, smart tourism and the like is brought. The tour route mining is one of main contents of intelligent tourism, and aims to analyze tourist big data, identify hot tour routes, provide reference for tourists, scenic spots and tour management departments, and provide accurate marketing support for tourism related industries.
With the large-scale popularization of mobile internet and intelligent terminals, operators accumulate large-scale mobile phone signaling data, and mine tour routes based on signaling big data, so that possibility is provided for accurately mastering travel behaviors of tourists and scenic spot people flow data. The main problems existing in the traditional tour route mining are that the comprehensive, accurate and dynamic data support of the tour spots is lacked, and a series of problems of overload operation, traffic congestion, low satisfaction degree of tourists and the like in the scenic spots cannot be solved. Some tourism line mining technologies based on signaling big data also appear recently, but the factors of scenic spot positions, scenic spot sizes, adjacent scenic spots and the like are not considered, and the reference value of the mined tourism lines is limited.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, the invention aims to provide a tour route mining method based on large signaling data, which optimizes the tour route mining effect.
In order to achieve the above purpose, an embodiment of the present invention provides a tour route mining method based on signaling big data, including the following steps: step S1, acquiring original mobile phone signaling data of a preset area, calculating the time length of each connection of a mobile phone with a base station, and removing short-time connection noise data generated by a ping-pong effect in the original mobile phone signaling data to obtain a global mobile phone signaling data table; step S2, crawling scenic spot POI data of a preset area to obtain a scenic spot list, screening base stations in each scenic spot range according to the scenic spot list and a base station list in the global mobile phone signaling data table, constructing a base station-scenic spot list, and removing non-scenic spot base station connection data in the global mobile phone signaling data table according to the base station-scenic spot list to obtain a scenic spot mobile phone signaling data table; step S3, removing noise data generated by passers-by, scenic spot workers and resident residents nearby in the scenic spot mobile phone signaling data sheet to obtain a scenic spot visitor mobile phone signaling data sheet; step S4, constructing a scenic spot sequence for the mobile phone signaling data table of the scenic spot visitor according to the time sequence, and performing aggregation calculation to obtain a preliminary tourist route table; and step S5, merging adjacent scenic spots in the preliminary browsing route by adopting a hierarchical clustering method based on Euclidean distance to obtain an optimal tourism route.
According to the tour route mining method based on the signaling big data, disclosed by the embodiment of the invention, in order to ensure the accuracy of the result, ping-pong effect data, non-scenic area base station connection data and base station connection data of passers-by, scenic area workers and resident residents are sequentially removed, then a preliminary scenic area tour route is constructed based on the cleaned data, and the close scenic areas are merged and optimized based on hierarchical clustering of Euclidean distances, so that the finally mined tour route is more accurate, and better experience is provided for a user.
In addition, the tour route mining method based on the big signaling data according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, a formula for calculating the time duration for connecting the mobile phone to the base station is as follows:
lasttime(x)=endtime(x)-starttime(x)
wherein, lasttime (x) is the time duration of connecting the mobile phone to the base station, endtime (x) is the time of disconnecting the mobile phone from the base station, and starttime (x) is the time of starting connecting the mobile phone to the base station.
Further, in an embodiment of the present invention, the global handset signaling data table includes: date, user ID, base station, time of connecting the mobile phone to the base station, time of disconnecting the mobile phone from the base station and time of connecting the mobile phone to the base station.
Further, in an embodiment of the present invention, the scenic spot list includes names of the respective scenic spots, longitude and latitude of centers of the scenic spots, and radii of the scenic spots, and the base station list includes base station IDs, and longitude and latitude of the base stations.
Further, in an embodiment of the present invention, the scenic spot (tourist) cell phone signaling data table includes: date, user ID, base station ID, time of connecting the mobile phone to the base station, time of disconnecting the mobile phone from the base station, time of connecting the mobile phone to the base station and scenic spot name.
Further, in an embodiment of the present invention, in the step S3, the user ID, the name and the date of the scenic spot are used as keywords to perform aggregation calculation on the mobile phone signaling data table of the scenic spot, and the pedestrian passing through is removed by using the time of connecting with the base station every day that is less than the preset time threshold; and performing aggregation calculation on the mobile phone signaling data sheet of the tourist in the scenic region by taking the user ID, the name of the scenic region and the week number as keywords, and removing the signaling data of the workers and resident residents in the scenic region, which reach the frequent occurrence number in the preset period, in the mobile phone signaling data sheet of the tourist in the scenic region to obtain the mobile phone signaling data sheet of the tourist in the scenic region.
Further, in an embodiment of the present invention, the step S4 specifically includes: gathering the mobile phone signaling data tables of the scenic spot visitors by taking the user ID as a keyword to generate a scenic spot list ordered by the starting time; connecting and sequencing the time for starting to connect the base station and the name of the scenic spot in the mobile phone signaling data table of the scenic spot visitor by using a method for connecting and eliminating fields by using a regular expression, and then eliminating the time to obtain a visitor-scenic spot list; the tourist-scenic spot list comprises a user ID and a scenic spot sequence after time sequencing; and performing aggregation calculation on the tourist-scenic spot lists by taking the scenic spot sequences as keywords, counting the number of people in the same scenic spot list, sequencing by taking the number of people as the keywords, and obtaining the preliminary tourist visit route list, wherein the preliminary tourist visit route list comprises the number of route people and the scenic spot sequences sequenced according to the number of route people.
Further, in an embodiment of the present invention, the step S5 specifically includes: acquiring the scenic spot sequence according to the preliminary tourist visit route table; taking each scenic spot as an individual cluster, and initializing a scenic spot distance matrix; traversing the scenic spot sequence to calculate the Euclidean distance between any two scenic spots; and when the distance matrix is non-empty, selecting a minimum Euclidean distance, judging whether the minimum Euclidean distance is smaller than or equal to a preset threshold, if so, merging two scenic spots of the minimum Euclidean distance, deleting the minimum Euclidean distance in the distance matrix, iterating the process until the minimum Euclidean distance is larger than the preset threshold, finishing scenic spot clustering, and obtaining the optimal browsing route.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of a method for signaling big data based travel route mining according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating the implementation of a method for signaling big data based travel route mining in accordance with an embodiment of the present invention;
FIG. 3 is a flow chart of scenic spot clustering in accordance with an embodiment of the present invention;
fig. 4 is a base station information presentation diagram of one embodiment of the present invention;
FIG. 5 is a view of a scenic spot resident display in accordance with one embodiment of the present invention;
FIG. 6 is a diagram illustrating a number of people in a route for a weekend tour that is initially obtained, in accordance with one embodiment of the present invention;
FIG. 7 is a view showing the result of the scenic region clustering performed according to an embodiment of the present invention;
FIG. 8 is an illustration of an optimal tour route table presentation according to an embodiment of the present invention;
figure 9 is a TOP10 tour route display diagram of one embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The proposed travel route mining method based on signaling big data according to the embodiment of the invention is described below with reference to the accompanying drawings.
FIG. 1 is a flow chart of a method for signaling big data based travel route mining in accordance with one embodiment of the present invention.
FIG. 2 is a flow chart illustrating the implementation of a method for travel route mining based on big signaling data according to an embodiment of the present invention.
As shown in fig. 1 and 2, the travel route mining method based on signaling big data comprises the following steps:
in step S1, original mobile phone signaling data in a preset area is obtained, a time length for each connection of the mobile phone to the base station is calculated, and short-time connection noise data generated by a ping-pong effect in the original mobile phone signaling data is removed to obtain a global mobile phone signaling data table.
It can be understood that in a mobile communication system, if a handset is located at a position where signals of two base stations overlap, the handset can switch back and forth between the two base stations, so that a so-called "ping-pong effect" is generated, so that the two base stations generate a large amount of signaling data which can generate a short-time connection, and a large error is caused as a result. In order to reduce the influence of the short-time connection on subsequent calculation and remove the noise data, the time length of each connection of the mobile phone needs to be calculated to obtain a visitor mobile phone signaling data table which comprises the user ID, the base station ID, the time for starting the connection of the mobile phone with the base station, the time for disconnecting the mobile phone with the base station and the time length for connecting the mobile phone with the base station, wherein the formula for calculating the time length for connecting the mobile phone with the base station is as follows:
lasttime(x)=endtime(x)-starttime(x)
wherein, lasttime (x) is the time duration of connecting the mobile phone to the base station, endtime (x) is the time of disconnecting the mobile phone from the base station, and starttime (x) is the time of starting connecting the mobile phone to the base station.
Note that, for data with duration less than 1 minute, i.e. with timestamp less than 60000, it can be considered as noise data generated by ping-pong effect, and this part of data is removed.
In step S2, crawling scenic spot POI data of a preset area to obtain a scenic spot list, screening base stations within each scenic spot range according to the scenic spot list and a base station list in the global mobile phone signaling data table, constructing a base station-scenic spot table, and removing non-scenic spot base station connection data in the global mobile phone signaling data table according to the base station-scenic spot table to obtain a scenic spot mobile phone signaling data table;
specifically, step S2 may include the steps of:
when the scenic spot list is constructed, the scenic spots can be divided into small urban scenic spots (such as churches, museums and the like), medium urban scenic spots (such as blocks, parks and the like) and large urban outside scenic spots. The small-sized scenic spot in the city can cover the range by a single base station, so the radius range of the optional scenic spot is set to be 500m, and the radius of the medium scenic spot is set to be within 1 km; for natural scenic spots in suburbs, the occupied area is large, and no interference exists at the periphery, so that tourists entering the scenic spots can be counted, and the scenic spot range is selected within 2km of the central point of the scenic spot. The scenic spot list thus includes the name of the scenic spot, the longitude and latitude of the central point of the scenic spot, and the radius of the scenic spot.
And traversing a base station list in the global mobile phone signaling data table, judging whether each base station is covered in a certain scenic spot in the scenic spot list according to the longitude and latitude of the base station, screening the base stations in the range of each scenic spot, realizing the binding of the base stations and the corresponding scenic spots, and constructing a base station-scenic spot list. The base station list comprises base station IDs and base station longitude and latitude, and the base station-scenic spot list comprises the base station IDs and scenic spot names.
The algorithm may specifically be as follows:
inputting: scenic spot list, base station list
And (3) outputting: base station-scenic spot list
Step1:
Traversing base station lists
Traverse scenic spot list
If the longitude and latitude of the base station are in the scenic spot range
Binding the base station with the scenic spot, and writing the base station-scenic spot list.
Step2:
Returning to base station-scenic spot list
Then, the base station-scenic spot list and the global mobile phone signaling data list with ping-pong effect removed are internally connected by using the base station ID as a key word to obtain a signaling data list only in the scenic spot range, i.e., a scenic spot mobile phone signaling data list, which includes a date, a user ID, a base station ID, a connection start time, a connection end time, a connection duration and a scenic spot name, as shown in table 1 below.
Table 1 scenic spot visitor's mobile phone signaling data table structure
Figure BDA0003091834910000051
At the moment, the scenic spot mobile phone signaling data table still can not be directly subjected to subsequent calculation, passers-by and scenic spot workers resident screening is also needed, non-visitor interference data is removed, and the accuracy of the result is ensured.
In step S3, noise data generated by passers-by, workers in the scenic region, and resident residents nearby in the scenic region cell phone signaling data table are removed to obtain a scenic region visitor cell phone signaling data table.
Further, in an embodiment of the present invention, in step S3, the user ID, the name and the date of the scenic spot are used as keywords to perform aggregation calculation on the scenic spot signaling data table, and the pedestrians passing through are removed by using the time of connecting with the base station every day being less than the preset time threshold; and performing aggregation calculation on the scenic spot mobile phone signaling data sheet by taking the user ID, the name of the scenic spot and the week number as keywords, and removing the scenic spot workers and resident residents which are frequently appeared within a preset time range in the scenic spot mobile phone signaling data sheet.
Specifically, the embodiment of the invention considers the characteristic of short time for passing pedestrians to pass through scenic spots, and the reaction is in the mobile phone signaling, that is, the total connection time of the base station is short.
And performing aggregation calculation on the user ID, the scenic spot name and the date as keywords by adopting the following screening mode:
sum(lasttime)≥1800000
the total connection time of the base station in the scenic spot of the mobile phone is more than 30 minutes, and the passers-by can be removed with a high probability.
Secondly, local residents and scenic spot workers at the scenic spot are required to be identified, and then mobile phone signaling data of the workers are deleted from the scenic spot signaling data table, so that accuracy of tour route mining is guaranteed.
It can be understood that for the identification of resident residents and workers, the method of counting the number of days around the scenic spot in one week is adopted, and for general tourists, the tourists have only a very small probability of staying in the same scenic spot for a plurality of days in one week.
The user ID, the scenic spot name and the times of the week (which can be obtained by date conversion) are used as keywords to perform aggregation calculation, and the following screening method is adopted:
count(day)≥3
if the user observes more than three days per week in the scenic spot range in the time window, the user can be considered as a nearby resident or a scenic spot worker, and the selected time avoids holidays so as to avoid the problem that the holiday worker cannot count the holidays.
And obtaining a final scenic spot visitor mobile phone signaling data table through the processing.
In step S4, a scenic spot sequence is constructed in time sequence for the mobile phone signaling data table of the scenic spot visitor, and aggregation calculation is performed to obtain a preliminary visitor route table.
Further, in an embodiment of the present invention, step S4 specifically includes:
processing a mobile phone signaling data table of the tourist in the scenic region by taking the user ID as a key word, generating a tourist-scenic region list which is sorted by starting time, wherein the tourist-scenic region list comprises a user ID and a scenic region sequence which is sorted by time, then performing aggregation calculation on the tourist-scenic region list by taking the scenic region sequence as the key word, counting the number of people in the same scenic region sequence, sorting by taking the number of people as the key word, and obtaining a primary tourist route table which comprises a route number and a scenic region list which is sorted by time.
It should be noted that the scenic spot sequences need to be sorted according to time, and therefore, in the embodiment of the present invention, after sorting is performed by using the collect _ list function of the Hive SQL, a method of adding and removing fields by using a regular expression is used to ensure that the scenic spot sequences are arranged according to the time sequence. Specifically, the time is connected with the character string, and then the obtained character string is sorted according to the lexicographic order, and the lexicographic order at the moment is the time sorting because the time is a number in the front. And similarly, using a Hive regular expression replacing operation, identifying numbers by using the regular expression, and replacing the time part with a blank character string to obtain a scenic spot sequence ordered in time sequence. As shown in table 2 below, the visitor-scenic spot list includes a user ID and a chronologically ordered sequence of scenic spots.
TABLE 2 tourist-scenic spot List
Figure BDA0003091834910000061
And performing aggregation calculation by taking the scenic spot lists as keywords, counting the number of people in the same scenic spot list, and sequencing by taking the number of people as the keywords to obtain a preliminary tourist route list comprising the keywords, the scenic spot lists and the number of people.
Table 3 tour route table structure
Figure BDA0003091834910000071
In step S5, neighboring scenic spots in the preliminary browsing route are merged by using a hierarchical clustering method based on euclidean distance to obtain an optimal tour route.
It can be understood that the preliminarily acquired mining results have a large number of different routes of the similar scenic spots, and the tour sequence among the similar scenic spots is not practical, so that the similar scenic spots are intelligently clustered, the similar scenic spots are combined, and the tour route mining results are optimized. According to the embodiment of the invention, a hierarchical clustering algorithm based on Euclidean distance is adopted to cluster the scenic spots with the longitude and latitude difference within 0.015.
Step S5 specifically includes:
acquiring the scenic spot sequence according to the preliminary tourist visit route table;
taking each scenic spot as an individual cluster, and initializing a scenic spot distance matrix;
traversing the scenic spot sequence to calculate the Euclidean distance between any two scenic spots;
when the distance matrix is not empty, selecting the minimum Euclidean distance, and judging whether the minimum Euclidean distance is smaller than or equal to a preset threshold, if so, merging two scenic spots of the minimum Euclidean distance, deleting the minimum Euclidean distance in the distance matrix, iterating the process until the minimum Euclidean distance is larger than the preset threshold, and finishing the scenic spot clustering, as shown in FIG. 3.
The specific algorithm processing flow is as follows:
inputting: scenic spot list x
And (3) outputting: clustering of close scenic spots
Step1:
Obtaining a scenic spot list;
each scenic spot is taken as an independent cluster, and the father node of each scenic spot is the self
Initializing scenic spot distance matrix distance
Step2:
Initializing a minimum heap q
Traverse all scenic spot pairs i and j in x
Calculating the Euclidean distance (i, j)
Putting the triple (i, j, distance (i, j)) into a minimum heap q
Step3:
While minimum heap q is not empty
p-q heap top element
Ending the loop if the minimum distance exceeds a threshold
If the p.x cluster is not equal to the p.y cluster
Merging p.x clusters with p.y clusters
q popping heap top element
Outputting the scenic spot clustering result
A scenic spot with a threshold value within 1.5 kilometers can be considered as a close scenic spot, that is, a scenic spot with a latitude and longitude distance difference within 0.015 is considered as a close scenic spot. .
And regarding the scenic spots with the same clustering result as the same scenic spot, and recalculating the tour route analysis result to obtain an optimal tour route list.
The method for tourism route mining based on big signaling data of the embodiment of the invention is verified through a specific embodiment.
In order to verify the effectiveness of the method, the signaling big data of a certain week of the central avenue in Harbin city is selected for verification.
And selecting a designated scenic spot, and seeing basic information of base stations around the scenic spot, including the total daily connection number of the base stations, the connection number of a single base station, the total daily connection number of the base stations after the ping-pong effect is removed, and the connection number of the single base station. After the ping-pong effect is removed, the daily connection number of each base station is greatly reduced, and the negative influence brought by the ping-pong effect is greatly reduced. As shown in fig. 4, base station information of a certain attraction on 1 month and 11 days of 2020 is taken as an example.
Resident screening, drop-down list selects appointed scenic spot, can see the peripheral resident information list of scenic spot. As shown in fig. 5, for example, the list of residents near a certain attraction is used, and the mobile phone numbers of the residents are replaced by a mobile phone number.
In the aspect of tour route analysis, people counting is carried out on tour route mining results according to the same scenic spot list, people are used as keywords for sorting, and a preliminary tour route list of tourists is obtained. The results obtained at the end of a week are shown in figure 6.
In the aspect of scenic spot clustering analysis, scenic spots within 1.5 kilometers are selected as similar scenic spots, namely, scenic spots with the longitude and latitude distance difference of 0.015 or less are selected as similar scenic spots, the similar scenic spots are merged through a hierarchical clustering algorithm based on Euclidean distance, and the clustering result is shown in FIG. 7.
And recalculating the scenic spot clustering result and the preliminary tourist route table to obtain a final tourist route mining result, wherein the final tourist route mining result is a tourist route and the number of people in a week as shown in fig. 8.
The TOP10 tour route and the number of visitors can be displayed through a horizontal bar chart, so that the tour route mining result can be displayed more intuitively, as shown in fig. 9.
Therefore, the tourism route mining method based on the signaling big data provided by the embodiment of the invention establishes a tourism route system, can simultaneously sequence the popularity of different routes, and achieves the expected purpose.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (8)

1. A tourism route mining method based on signaling big data is characterized by comprising the following steps:
step S1, acquiring original mobile phone signaling data of a preset area, calculating the time length of each connection of a mobile phone with a base station, and removing short-time connection noise data generated by a ping-pong effect in the original mobile phone signaling data to obtain a global mobile phone signaling data table;
step S2, crawling scenic spot POI data of a preset area to obtain a scenic spot list, screening base stations in each scenic spot range according to the scenic spot list and a base station list in the global mobile phone signaling data table, constructing a base station-scenic spot list, and removing non-scenic spot base station connection data in the global mobile phone signaling data table according to the base station-scenic spot list to obtain a scenic spot mobile phone signaling data table;
step S3, removing noise data generated by passers-by, scenic spot workers and resident residents nearby in the scenic spot mobile phone signaling data sheet to obtain a scenic spot visitor mobile phone signaling data sheet;
step S4, constructing a scenic spot sequence for the mobile phone signaling data table of the scenic spot visitor according to the time sequence, and performing aggregation calculation to obtain a preliminary tourist route table;
and step S5, merging adjacent scenic spots in the preliminary tourist route list by adopting a hierarchical clustering method based on Euclidean distance to obtain an optimal tourist route.
2. The method for mining the tour route based on the big signaling data of claim 1, wherein the formula for calculating the duration of the mobile phone connecting with the base station is as follows:
lasttime(x)=endtime(x)-starttime(x)
wherein, lasttime (x) is the time duration of connecting the mobile phone to the base station, endtime (x) is the time of disconnecting the mobile phone from the base station, and starttime (x) is the time of starting connecting the mobile phone to the base station.
3. The method for mining tour routes based on signaling big data as claimed in claim 1, wherein the global mobile phone signaling data table includes: date, user ID, base station ID, time of connecting the mobile phone to the base station, time of disconnecting the mobile phone from the base station and time of connecting the mobile phone to the base station.
4. The method as claimed in claim 1, wherein the scenic spot list includes names of the scenic spots, longitude and latitude of centers of the scenic spots and radii of the scenic spots, and the base station list includes ID of base stations and longitude and latitude of base stations.
5. The tour route mining method based on signaling big data as claimed in claim 1, wherein the mobile phone signaling data table of the tourist in the scenic region comprises: date, user ID, base station ID, time of connecting the mobile phone to the base station, time of disconnecting the mobile phone from the base station, time of connecting the mobile phone to the base station and scenic spot name.
6. The travel route mining method based on signaling big data according to claim 1, wherein, in the step S3,
carrying out aggregation calculation on the mobile phone signaling data table of the scenic region by taking the user ID, the name and the date of the scenic region as keywords, and removing the passerby by utilizing the time that the daily base station connection time is less than a preset time threshold;
and performing aggregation calculation on the scenic spot mobile phone signaling data table by taking the user ID, the name of the scenic spot and the week number as keywords, and removing the scenic spot workers and resident residents which reach the frequent occurrence number in a preset period in the scenic spot mobile phone signaling data table to obtain the scenic spot visitor mobile phone signaling data table.
7. The method for mining a travel route based on signaling big data as claimed in claim 1, wherein said step S4 specifically comprises:
gathering the mobile phone signaling data table of the scenic spot tourists by taking the user ID as a keyword to generate a scenic spot sequence ordered by the starting time;
connecting and sorting the time for starting to connect the base station and the name of the scenic spot in the mobile phone signaling data table of the scenic spot visitor by using a method for connecting and eliminating fields by using a regular expression, and then eliminating the time to obtain a visitor-scenic spot list, wherein the visitor-scenic spot list comprises a user ID and a scenic spot sequence after time sorting;
and performing aggregation calculation on the tourist-scenic spot list by taking the scenic spot sequence as a key word, counting the number of people in the same scenic spot sequence, sequencing by taking the number of people as the key word, and obtaining the initial tourist visit route list, wherein the initial tourist visit route list comprises the scenic spot sequences after the number of people and time are sequenced.
8. The method for mining a travel route based on signaling big data as claimed in claim 1, wherein said step S5 specifically comprises:
acquiring the scenic spot sequence according to the preliminary tourist visit route table;
taking each scenic spot as an individual cluster, and initializing a scenic spot distance matrix;
traversing the scenic spot sequence to calculate the Euclidean distance between any two scenic spots;
and when the distance matrix is non-empty, selecting a minimum Euclidean distance, judging whether the minimum Euclidean distance is smaller than or equal to a preset threshold, if so, merging two scenic spots of the minimum Euclidean distance, deleting the minimum Euclidean distance in the distance matrix, iterating the process until the minimum Euclidean distance is larger than the preset threshold, finishing scenic spot clustering, and obtaining the optimal browsing route.
CN202110597996.9A 2021-05-31 2021-05-31 Tour route mining method based on signaling big data Pending CN113313307A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110597996.9A CN113313307A (en) 2021-05-31 2021-05-31 Tour route mining method based on signaling big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110597996.9A CN113313307A (en) 2021-05-31 2021-05-31 Tour route mining method based on signaling big data

Publications (1)

Publication Number Publication Date
CN113313307A true CN113313307A (en) 2021-08-27

Family

ID=77376208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110597996.9A Pending CN113313307A (en) 2021-05-31 2021-05-31 Tour route mining method based on signaling big data

Country Status (1)

Country Link
CN (1) CN113313307A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113923603A (en) * 2021-09-30 2022-01-11 京东城市(北京)数字科技有限公司 Tourist trajectory analysis method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956951A (en) * 2016-05-05 2016-09-21 杭州诚智天扬科技有限公司 Method of identifying hot tourist route based on mobile signaling
CN106022993A (en) * 2016-05-05 2016-10-12 杭州诚智天扬科技有限公司 Traveling hot line identification method based on mobile signaling
US20170131109A1 (en) * 2015-11-10 2017-05-11 Chiun Mai Communication Systems, Inc. Electronic device and method for planning tour route
CN111429220A (en) * 2020-03-25 2020-07-17 西安交通大学 Travel route recommendation system and method based on operator big data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170131109A1 (en) * 2015-11-10 2017-05-11 Chiun Mai Communication Systems, Inc. Electronic device and method for planning tour route
CN105956951A (en) * 2016-05-05 2016-09-21 杭州诚智天扬科技有限公司 Method of identifying hot tourist route based on mobile signaling
CN106022993A (en) * 2016-05-05 2016-10-12 杭州诚智天扬科技有限公司 Traveling hot line identification method based on mobile signaling
CN111429220A (en) * 2020-03-25 2020-07-17 西安交通大学 Travel route recommendation system and method based on operator big data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘奕杉: "基于运营商数据的用户位置预测系统研究", 中国优秀硕士学位论文全文数据库信息科技辑, no. 2019, pages 1 - 48 *
杨东;韩继国;武平;赵昕;: "基于手机信令数据的游客识别与出行轨迹匹配方法", 交通运输研究, no. 06 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113923603A (en) * 2021-09-30 2022-01-11 京东城市(北京)数字科技有限公司 Tourist trajectory analysis method and device, computer equipment and storage medium
CN113923603B (en) * 2021-09-30 2023-11-07 京东城市(北京)数字科技有限公司 Tourist track analysis method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
Chen et al. Dynamic cluster-based over-demand prediction in bike sharing systems
CN105677793B (en) The recommended method and device of the foundation of locality database and candidate pick-up point
CN111653099B (en) Bus passenger flow OD obtaining method based on mobile phone signaling data
CN105142106A (en) Traveler home-work location identification and trip chain depicting method based on mobile phone signaling data
CN109151168A (en) The switching method of code, device, mobile terminal and readable storage medium storing program for executing by bus
CN109561386A (en) A kind of Urban Residential Trip activity pattern acquisition methods based on multi-source location data
WO2015096400A1 (en) Bus planning method using mobile communication data mining
CN106651027B (en) Internet regular bus route optimization method based on social network
Xu et al. DESTPRE: a data-driven approach to destination prediction for taxi rides
CN103077604A (en) Traffic sensor management using traffic simulation to chose the sensors
CN109583611B (en) Customized bus stop site selection method based on network appointment data
CN102637358A (en) Citizen-satisfaction-based public bicycle service system and scheduling and dispatching method
CN106339716A (en) Mobile trajectory similarity matching method based on weighted Euclidean distance
CN108427679B (en) People stream distribution processing method and equipment thereof
CN110347777A (en) A kind of classification method, device, server and the storage medium of point of interest POI
CN107038620A (en) Based on user call a taxi preference information push and device
CN116233823B (en) Identification method of cross-city commute ring, electronic equipment and storage medium
CN100429953C (en) Information acquisition method, information providing method, and information acquisition device
CN105631551A (en) Optimal route recommendation method and device
CN106097060A (en) A kind of university students is left unused Cycle Hire software screening method system and its implementation
CN111429220A (en) Travel route recommendation system and method based on operator big data
Yuan et al. Recognition of functional areas based on call detail records and point of interest data
CN110046218A (en) A kind of method for digging, device, system and the processor of user's trip mode
CN111104468B (en) Method for deducing user activity based on semantic track
CN112199570A (en) Real estate information visualization analysis system and method based on web crawler

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination