CN115577190A - Tourist behavior data extraction method - Google Patents

Tourist behavior data extraction method Download PDF

Info

Publication number
CN115577190A
CN115577190A CN202211270201.4A CN202211270201A CN115577190A CN 115577190 A CN115577190 A CN 115577190A CN 202211270201 A CN202211270201 A CN 202211270201A CN 115577190 A CN115577190 A CN 115577190A
Authority
CN
China
Prior art keywords
travel
time
tourist
check
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211270201.4A
Other languages
Chinese (zh)
Other versions
CN115577190B (en
Inventor
赵莹
杨羽菲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202211270201.4A priority Critical patent/CN115577190B/en
Publication of CN115577190A publication Critical patent/CN115577190A/en
Application granted granted Critical
Publication of CN115577190B publication Critical patent/CN115577190B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请属于旅游数据处理技术领域,公开了一种游客行为数据提取方法。该方法包括:获取旅游景区签到数据,进行结构化处理,得到签到时空数据库;从旅游网站获取第一游记样本,对其中游记文本的时间信息和地点信息进行标记,得到标记旅游时空路径,基于标记方法,形成初步解析模块;获取第二游记样本,运行初步解析模块得到第二游记样本的解析旅游时空路径,基于解析旅游时空路径对初步解析模块进行完善,得到最终解析模块;将最终解析模块应用在预设时间窗口和预设目的地范围的游记样本中,得到游记时空数据库;基于所述签到时空数据库和所述游记时空数据库,得到可视化的游客时空行为路径图。为后续的旅游领域的专利分析提供结构化的数据。

Figure 202211270201

The application belongs to the technical field of tourism data processing, and discloses a method for extracting tourist behavior data. The method includes: obtaining the check-in data of tourist attractions, performing structural processing, and obtaining a check-in space-time database; acquiring the first travel note sample from a travel website, marking the time information and location information of the travel note text in it, and obtaining the marked travel space-time path, based on the mark method, forming a preliminary analysis module; obtaining the second travel note sample, running the preliminary analysis module to obtain the analytical travel space-time path of the second travel note sample, improving the preliminary analysis module based on the analysis of travel space-time path, and obtaining the final analysis module; applying the final analysis module In the travel notes samples of the preset time window and the preset destination range, a travel notes spatio-temporal database is obtained; based on the check-in spatio-temporal database and the travel notes spatio-temporal database, a visualized tourist spatio-temporal behavior path map is obtained. Provide structured data for subsequent patent analysis in the field of tourism.

Figure 202211270201

Description

一种游客行为数据提取方法A Method for Extracting Tourist Behavior Data

技术领域technical field

本申请涉及旅游数据处理技术领域,尤其涉及一种游客行为数据提取方法。This application relates to the technical field of tourism data processing, in particular to a method for extracting tourist behavior data.

背景技术Background technique

近年来随着经济的经济和交通的快速发展,国内游客的旅游意愿明显增加,旅游人数也在持续增加,旅游相关产业的收入也随之增加。同时随着互联网技术的发展,游客在旅游过程中在互联网上留下大量旅游相关数据,这些旅游相关数据可以被用来进行旅游营销策划、游客数量预测、路线规划、景点评价等分析研究,进一步给游客提供更好的旅游服务和开发更好的旅游产品。但现有技术获取的信息虽然信息来源较为广泛,数据收集集中在静态的旅游信息的提取,缺乏不同景点间的动态的旅游信息的结构化处理。In recent years, with the rapid development of the economy and transportation, the willingness of domestic tourists to travel has increased significantly, the number of tourists has also continued to increase, and the income of tourism-related industries has also increased. At the same time, with the development of Internet technology, tourists leave a large amount of tourism-related data on the Internet during their travels. These tourism-related data can be used for analysis and research such as tourism marketing planning, tourist number prediction, route planning, and scenic spot evaluation. To provide tourists with better tourism services and develop better tourism products. However, although the information obtained by the existing technology has a wide range of information sources, the data collection focuses on the extraction of static tourism information, and lacks the structured processing of dynamic tourism information between different scenic spots.

发明内容Contents of the invention

为此,本申请的实施例提供了一种游客行为数据提取方法,实现了对动态的旅游信息的结构化提取和可视化处理。To this end, the embodiment of the present application provides a method for extracting tourist behavior data, which realizes structured extraction and visualization processing of dynamic tourist information.

第一方面,本申请提供一种游客行为数据提取方法。In the first aspect, the present application provides a method for extracting tourist behavior data.

本申请是通过以下技术方案得以实现的:The application is achieved through the following technical solutions:

一种游客行为数据提取方法,所述方法包括:A method for extracting tourist behavior data, said method comprising:

获取旅游景区签到数据,并对所述旅游景区签到数据进行结构化处理,得到基于所述旅游景区签到数据的签到时空数据库;Obtain the check-in data of tourist attractions, and carry out structured processing to the check-in data of the tourist attractions, and obtain the check-in space-time database based on the check-in data of the tourist attractions;

从旅游网站获取第一游记样本,对所述第一游记样本中每一篇游记文本的时间信息和地点信息进行标记,得到标记旅游时空路径,基于所述标记旅游时空路径的标记方法,形成初步解析模块;Obtain the first travel note sample from a travel website, mark the time information and location information of each travel note text in the first travel note sample, and obtain the marked travel space-time path, based on the marking method of the travel space-time path, form a preliminary Analysis module;

获取第二游记样本,运行所述初步解析模块得到所述第二游记样本的所有游记文本的解析旅游时空路径,基于解析旅游时空路径对所述初步解析模块进行完善,得到最终解析模块;Obtaining a second travel note sample, running the preliminary analysis module to obtain the analytical travel space-time paths of all travel texts of the second travel note sample, improving the preliminary analysis module based on the analysis of travel space-time paths, and obtaining the final analysis module;

将所述最终解析模块应用在预设时间窗口和预设目的地范围的游记样本中,得到基于游记的游记时空数据库;Applying the final parsing module to travel notes samples in a preset time window and a preset destination range to obtain a travel notes spatio-temporal database based on travel notes;

基于所述签到时空数据库和所述游记时空数据库构建游客流动行为数据库,基于所述游客流动行为数据库得到可视化的游客时空行为路径图。A tourist flow behavior database is constructed based on the check-in spatio-temporal database and the travel notes spatio-temporal database, and a visualized tourist spatio-temporal behavior path map is obtained based on the tourist flow behavior database.

在本申请一较佳的示例中可以进一步设置为,所述基于所述签到时空数据库和所述游记时空数据库构建游客流动行为数据库,基于所述游客流动行为数据库得到可视化的游客时空行为路径图的步骤之前,还包括:In a preferred example of the present application, it may be further configured that the tourist flow behavior database is constructed based on the check-in space-time database and the travel note space-time database, and the visualized tourist space-time behavior path map is obtained based on the tourist flow behavior database. Before step, also include:

收集旅游景区的点评数据,计算单个旅游景区在当地城市全部旅游景区的点评数据比例,基于点评数据比例获得所述旅游景区的基准旅游到访人数;Collect comment data of tourist attractions, calculate the proportion of comment data of a single tourist attraction in all tourist attractions in the local city, and obtain the benchmark number of tourists visiting the tourist attraction based on the proportion of comment data;

基于所述签到时空数据库得到第一旅游到访人数,基于所述游记时空数据库得到第二旅游到访人数,分别计算所述第一旅游到访人数和第二旅游到访人数与所述基准旅游到访人数的偏差比例。Based on the check-in spatio-temporal database, the first tourist visitor number is obtained, based on the travel notes spatio-temporal database, the second tourist visitor number is obtained, and the difference between the first tourist visitor number and the second tourist visitor number and the reference tourist visitor number is calculated respectively. The deviation ratio of the number of visitors.

在本申请一较佳的示例中可以进一步设置为,所述分别计算所述第一旅游到访人数和第二旅游到访人数与所述基准旅游到访人数的偏差比例之后还包括:In a preferred example of the present application, it may be further set that, after calculating the deviation ratios between the first number of tourist visitors and the second number of tourist visitors and the reference number of tourist visitors, it may further include:

将所述偏差比例与预设偏差比例进行对比,若所述偏差比例超过预设偏差比例,则需进一步完善所述签到时空数据库和所述游记时空数据库;Comparing the deviation ratio with the preset deviation ratio, if the deviation ratio exceeds the preset deviation ratio, the check-in spatio-temporal database and the travel notes spatio-temporal database need to be further improved;

若所述偏差比例在预设偏差比例内,则以所述签到时空数据库和所述游记时空数据库为基础,得到游客流动行为数据库。If the deviation ratio is within the preset deviation ratio, a visitor flow behavior database is obtained based on the check-in spatio-temporal database and the travel notes spatio-temporal database.

在本申请一较佳的示例中可以进一步设置为,所述获取旅游景区签到数据,并对所述旅游景区签到数据进行结构化处理,得到基于所述旅游景区签到数据的签到时空数据库的步骤,包括:In a preferred example of the present application, it may be further set as the step of obtaining the check-in data of the tourist attraction, and performing structural processing on the check-in data of the tourist attraction, and obtaining the check-in spatio-temporal database based on the check-in data of the tourist attraction, include:

根据目标区域内的景区名称,获取每一个景区的景区签到ID,得到景区、景区签到ID 以及景区编号的列表;According to the name of the scenic spot in the target area, the scenic spot check-in ID of each scenic spot is obtained, and the list of scenic spot, scenic spot sign-in ID and scenic spot number is obtained;

获取时间窗口内所述景区的所有景区签到ID对应的用户ID的签到数据,得到所述景区的初始用户数据库,所述初始用户数据库中包含用户ID、签到时间、签到地点以及签到内容;Obtain the check-in data of the corresponding user IDs of all scenic spots check-in IDs in the scenic spot in the time window, and obtain the initial user database of the scenic spot, which includes user ID, check-in time, check-in location and check-in content in the initial user database;

获取所述初始用户数据库中的所有用户ID的个人信息,作为附表对所述初始用户数据库进行补充,得到旅游景区签到数据的签到时空数据库。Obtain the personal information of all user IDs in the initial user database, supplement the initial user database as a supplementary table, and obtain the check-in spatio-temporal database of tourist attraction check-in data.

在本申请一较佳的示例中可以进一步设置为,获取每一个景区的景区签到ID名称的步骤还包括:In a preferred example of the present application, it can be further set that the step of obtaining the ID name of the scenic spot of each scenic spot also includes:

建立景区的主签到ID和若干从属签到ID,并将若干所述从属签到ID汇总到所述主签到 ID,将所述主签到ID作为每一个景区的景区签到ID。Establish the main sign-in ID and several subordinate sign-in IDs of the scenic spot, and summarize several of the subordinate sign-in IDs into the main sign-in ID, and use the main sign-in ID as the scenic spot sign-in ID of each scenic spot.

在本申请一较佳的示例中可以进一步设置为,所述获取时间窗口内所述景区的所有景区签到ID对应的用户ID的签到数据的步骤还包括:In a preferred example of the present application, it may be further set that the step of obtaining the check-in data of the user IDs corresponding to the check-in IDs of all the scenic spots in the scenic spot within the time window further includes:

若同一用户ID对应多个不同的景区签到ID,则将所述用户ID与所述多个不同的景区签到ID进行关联;If the same user ID corresponds to multiple different scenic spot check-in IDs, then the user ID is associated with the multiple different scenic spot check-in IDs;

若同一用户ID对应多个相同的景区签到ID,则将所述多个相同的景区签到ID进行去重处理。If the same user ID corresponds to multiple identical scenic spot check-in IDs, the multiple identical scenic spot check-in IDs are deduplicated.

在本申请一较佳的示例中可以进一步设置为,从旅游网站获取第一游记样本,对所述第一游记样本中每一篇游记文本的时间信息和地点信息进行标记,得到标记旅游时空路径,基于所述标记旅游时空路径的标记方法,形成初步解析模块的步骤,包括:In a preferred example of the present application, it may be further configured to obtain a first travel note sample from a travel website, mark the time information and location information of each travel note text in the first travel note sample, and obtain a marked travel space-time path , based on the marking method of marking the travel space-time path, the step of forming a preliminary parsing module includes:

基于精确日期、精确时间、模糊时间以及相对时间来标记所述第一游记样本中每一篇游记文本的时间关键词,将所述时间关键词放置到时间词库中,并以精确日期和精确时间作为分割点将所述游记文本分割成文本段;Mark the time keywords of each piece of travel text in the first travel note sample based on precise date, precise time, fuzzy time and relative time, place the time keyword in the time lexicon, and use precise date and precise Time is used as the segmentation point to segment the travel note text into text segments;

基于精确地点、模糊地点以及关联地点来识别所述文本段的地点关键词,并将所述地点关键词放置到地点词库中;identifying location keywords for the text segment based on precise locations, vague locations, and associated locations, and placing the location keywords in a location thesaurus;

基于所述时间关键词和地点关键词的提取方法,构建初步解析模块。Based on the extraction method of the time keyword and location keyword, a preliminary parsing module is constructed.

在本申请一较佳的示例中可以进一步设置为,所述获取第二游记样本,依次运行所述初步解析模块得到所述第二游记样本的所有游记文本的解析旅游时空路径,基于解析旅游时空路径对所述初步解析模块进行完善,得到最终解析模块的步骤包括:In a preferred example of the present application, it can be further set as, the acquisition of the second travel note sample, and sequentially running the preliminary parsing module to obtain the analytical travel time and space paths of all travel texts of the second travel note sample, based on the analysis of travel time and space The path is to improve the preliminary parsing module, and the steps of obtaining the final parsing module include:

若所述第二游记样本中包含旅游照片,提取所述旅游照片中的时间信息以及经纬度信息,基于时间顺序对所述旅游照片进行排列,生成所述旅游照片的图像时空路径。If the second travel note sample includes travel photos, extract time information and latitude and longitude information in the travel photos, arrange the travel photos based on time order, and generate image spatiotemporal paths of the travel photos.

在本申请一较佳的示例中可以进一步设置为,所述基于解析旅游时空路径对所述初步解析模块进行完善,得到最终解析模块的步骤包括:In a preferred example of the present application, it may be further set that the step of improving the preliminary analysis module based on analyzing the travel space-time path, and obtaining the final analysis module includes:

在所述第二游记样本中抽取部分游记文本进行标记得到标记旅游时空路径,并将所述初步解析模块得到的解析旅游时空路径与标记旅游时空路径进行对比,基于对比结果对所述初步解析模块进行完善,得到最终时间词库和最终解析模块。Extract part of the travel text from the second sample of travel notes and mark it to obtain the marked travel space-time path, and compare the analyzed travel time-space path obtained by the preliminary analysis module with the marked travel time-space path, and based on the comparison result, the preliminary analysis module Carry out improvement to obtain the final time thesaurus and the final parsing module.

在本申请一较佳的示例中可以进一步设置为,所述基于所述签到时空数据库和所述游记时空数据库,得到可视化的游客时空行为路径图的步骤包括:In a preferred example of the present application, it may be further set that, based on the check-in spatio-temporal database and the travel notes spatio-temporal database, the step of obtaining a visualized tourist spatio-temporal behavior path map includes:

基于所述签到时空数据库和所述游记时空数据库,通过ArcGIS软件工具形成单个或多个游客的路径图,并在所述路径图中加入时间轴和地图底图,形成可视化的游客时空行为路径图。Based on the spatio-temporal database of check-ins and the spatio-temporal database of travel notes, the path map of single or multiple tourists is formed through ArcGIS software tools, and the time axis and map base map are added to the path map to form a visualized path map of tourist spatio-temporal behavior .

综上所述,与现有技术相比,本申请实施例提供的技术方案带来的有益效果至少包括:In summary, compared with the prior art, the beneficial effects brought by the technical solutions provided by the embodiments of the present application at least include:

获取旅游景区签到数据,进行结构化处理,得到签到时空数据库;对从旅游网站获取的游记文本的时间信息和地点信息进行标记,基于标记方法形成解析模块,进而利用解析模块对游记样本进行解析,获得游记时空数据库;基于签到时空数据库和游记时空数据库构成游客流动行为数据库,基于游客流动行为数据库进行空间分析及可视化。融合客观的时空地理信息的基础上,基于个体的在不同景点间的动态的旅游信息数据,将主观化的签到数据和游记数据内容的提取方法结构化,补充了其他数据收集方法中缺乏对个体主观信息的关注,实现了时空路径的精细化、具体化。Obtain the check-in data of tourist attractions, carry out structured processing, and obtain the check-in spatio-temporal database; mark the time information and location information of the travel text obtained from the travel website, form an analysis module based on the marking method, and then use the analysis module to analyze the travel sample, Obtain the spatio-temporal database of travel notes; form the tourist flow behavior database based on the check-in spatio-temporal database and the travel notes spatio-temporal database, and perform spatial analysis and visualization based on the tourist flow behavior database. On the basis of integrating objective spatio-temporal geographic information, based on individual dynamic tourism information data among different scenic spots, the extraction method of subjective check-in data and travel data content is structured, which complements the lack of individual data collection methods in other data collection methods. The focus on subjective information realizes the refinement and concreteness of the space-time path.

附图说明Description of drawings

图1为本申请一示例性实施例提供的游客行为数据提取方法的流程示意图;Fig. 1 is a schematic flow chart of a method for extracting tourist behavior data provided by an exemplary embodiment of the present application;

图2为本申请一示例性实施例提供的建立签到时空数据库的流程示意图;FIG. 2 is a schematic flow diagram of establishing a check-in spatio-temporal database provided by an exemplary embodiment of the present application;

图3为本申请一示例性实施例提供的建立游记时空数据库的流程示意图。Fig. 3 is a schematic flow chart of establishing a spatio-temporal database of travel notes provided by an exemplary embodiment of the present application.

具体实施方式detailed description

本具体实施例仅仅是对本申请的解释,其并不是对本申请的限制,本领域技术人员在阅读完本说明书后可以根据需要对本实施例做出没有创造性贡献的修改,但只要在本申请的权利要求范围内都受到专利法的保护。This specific embodiment is only an explanation of this application, and it is not a limitation of this application. Those skilled in the art can make modifications to this embodiment without creative contribution according to needs after reading this specification, but as long as the rights of this application All claims are protected by patent law.

为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this application.

另外,本申请中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本申请中字符“/”,如无特殊说明,一般表示前后关联对象是一种“或”的关系。In addition, the term "and/or" in this application is only an association relationship describing associated objects, indicating that there may be three relationships, for example, A and/or B may indicate: A exists alone, and A and B exist simultaneously , there are three cases of B alone. In addition, the character "/" in this application, unless otherwise specified, generally indicates that the contextual objects are an "or" relationship.

本申请中术语“第一”“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系,也不对数量和执行顺序进行限定。In this application, the terms "first" and "second" are used to distinguish the same or similar items with basically the same function and function. It should be understood that "first", "second" and "nth" There are no logical or timing dependencies, nor are there restrictions on quantity or order of execution.

下面结合说明书附图对本申请实施例作进一步详细描述。The embodiments of the present application will be further described in detail below in conjunction with the accompanying drawings.

在本申请的一个实施例中,提供一种游客行为数据提取方法,如图1所示,主要步骤描述如下:In one embodiment of the present application, a method for extracting tourist behavior data is provided, as shown in Figure 1, the main steps are described as follows:

S10:获取旅游景区签到数据,并对所述旅游景区签到数据进行结构化处理,得到基于所述旅游景区签到数据的签到时空数据库。S10: Obtain the check-in data of the tourist attraction, and perform structural processing on the check-in data of the tourist attraction, to obtain a check-in space-time database based on the check-in data of the tourist attraction.

具体的,以从微博上获取旅游景区签到数据为例进行说明。使用Python获取目标区域内的旅游景区签到数据,随后对旅游景区签到数据进行结构化处理,如图1所示,具体步骤为:Specifically, take the acquisition of tourist attraction check-in data from Weibo as an example for illustration. Use Python to obtain the check-in data of tourist attractions in the target area, and then perform structured processing on the check-in data of tourist attractions, as shown in Figure 1. The specific steps are:

根据目标区域内的景区名称,在微博签到页面上依次对景区名称进行一一搜索,获取每一个景区名称对应的景区签到ID和景区编号,创建列表,将景区名称和景区签到ID、景区编号进行关联,作为列表的行元素,得到包含景区名称、景区签到ID以及景区编号的列表;According to the names of the scenic spots in the target area, search the names of the scenic spots one by one on the Weibo sign-in page, obtain the scenic spot sign-in ID and scenic spot number corresponding to each scenic spot name, create a list, and put the scenic spot name and scenic spot sign-in ID and scenic spot number Associating, as a row element of the list, a list containing the name of the scenic spot, the check-in ID of the scenic spot and the number of the scenic spot is obtained;

获取时间窗口内所有景区的所有景区签到ID对应的用户签到数据,包括与该景区签到 ID相关的用户ID、签到时间、签到地点以及签到内容,得到该景区的初始用户数据库。需要进行说明的是,以微博为代表的社交媒体数据,仅能进行实时的数据爬取,因此需要设定数据采集的时间窗口,通常按照数据分析的目的,可分为三种类型:年份单元,设定某一年的时间;月份单元,设定某一年中的某一月时间;节假日单元,设定某个法定节假日3-7天的时间。进一步获取初始用户数据库中所有用户ID对应的个人信息,包括用户性别、用户客源地、出生年月日以及毕业院校信息,将个人信息作为附表对初始用户数据库进行补充,得到基于旅游景区签到数据的签到时空数据库。Obtain the user sign-in data corresponding to all scenic spot sign-in IDs of all scenic spots in the time window, including the user ID, sign-in time, sign-in location and sign-in content related to the sign-in ID of the scenic spot, and obtain the initial user database of the scenic spot. What needs to be explained is that social media data represented by Weibo can only be crawled in real time, so it is necessary to set a time window for data collection. Generally, according to the purpose of data analysis, it can be divided into three types: Year Unit, set the time of a certain year; month unit, set the time of a certain month in a certain year; holiday unit, set the time of 3-7 days of a certain legal holiday. Further obtain the personal information corresponding to all user IDs in the initial user database, including user gender, user origin, date of birth, and graduate school information, and supplement the initial user database with personal information as an attached table, and obtain Check-in spatio-temporal database of check-in data.

优选的,建立景区的主签到ID和若干从属签到ID,并将若干从属签到ID汇总到主签到 ID,将主签到ID作为每个景区的景区签到ID。Preferably, the main sign-in ID and several subordinate sign-in IDs of the scenic spot are established, and the several subordinate sign-in IDs are summarized into the main sign-in ID, and the main sign-in ID is used as the scenic spot sign-in ID of each scenic spot.

具体的,由于微博签到页对同一地点存在多个地名,或者某个景区包含多个小的景区,因此某些景区存在多个景区签到ID。可以建立景区的属于一级的主签到ID以及属于二级的从属签到ID,当若干从属签到ID属于同一个主签到ID,将其一起汇总到主签到ID中,将主签到ID作为每个景区的景区签到ID。Specifically, since there are multiple place names for the same place on the Weibo check-in page, or a certain scenic spot contains multiple small scenic spots, there are multiple scenic spot check-in IDs in some scenic spots. It is possible to establish the main sign-in ID belonging to the first level and the subordinate sign-in ID belonging to the second level of the scenic spot. When several subordinate sign-in IDs belong to the same main sign-in ID, they will be aggregated together into the main sign-in ID, and the main sign-in ID will be used as each scenic spot. The scenic area sign-in ID.

优选的,若同一用户ID对应多个不同的景区签到ID,则将该用户ID与多个不同的景区签到ID进行关联;若同一用户ID对应多个相同的景区签到ID,则将该多个相同的景区签到 ID进行去重处理。具体可以使用python遍历数据,将重复性数据进行过滤,去重处理可以避免将重复的数据保存到数据库中造成大量的冗余性数据。Preferably, if the same user ID corresponds to multiple different scenic spot check-in IDs, then the user ID is associated with multiple different scenic spot check-in IDs; if the same user ID corresponds to multiple identical scenic spot check-in IDs, then the multiple The same scenic spot check-in ID is deduplicated. Specifically, you can use python to traverse the data, filter repetitive data, and deduplicate processing can avoid saving duplicate data in the database and causing a large amount of redundant data.

因同一用户可能使用微博在目标区域内的多个景区进行签到,在初始用户数据库存在同一用户ID对应多个不同的景区签到ID,将同一用户ID的多个不同的景区签到ID进行关联。同时若同一用户ID对应多个相同的景区签到ID,对其中重复的景区签到ID进行去重,形成基于个体用户的在景区空间流动的行为轨迹,并以旅游时空数据表形式表现。旅游时空数据表依据用户ID作为行的划分,依次排布“时间点1、地点1、时间点2、地点2……”,详细列举每一个用户ID对应的旅游时空信息。Because the same user may use Weibo to check-in in multiple scenic spots in the target area, there are multiple different scenic spot check-in IDs corresponding to the same user ID in the initial user database, and multiple different scenic spot check-in IDs of the same user ID are associated. At the same time, if the same user ID corresponds to multiple identical scenic spot check-in IDs, duplicate scenic spot check-in IDs will be deduplicated to form a behavior track based on individual users' flow in the scenic spot space, and it will be represented in the form of tourism spatio-temporal data tables. The travel time-space data table is divided into rows according to user ID, and arranges "time point 1, location 1, time point 2, location 2..." in sequence, listing the travel time-space information corresponding to each user ID in detail.

如表1所示,展示了签到时空数据库的一部分数据,其中包含“用户ID”、“用户客源地”、“时间点1”、“签到地点1”等信息。As shown in Table 1, it shows part of the data of the check-in spatio-temporal database, which includes information such as "user ID", "user source place", "time point 1", "check-in location 1".

表1Table 1

Figure BDA0003894852010000051
Figure BDA0003894852010000051

Figure BDA0003894852010000061
Figure BDA0003894852010000061

注:①需注意用户客源地不得完全包括签到地点,否则应当视为非游客群体而排除;②使用者需注意自行设定去重的时间限度,并注意类似表格中最后一行的游客前往度假区长时间停留的情况。Note: ① It should be noted that the user’s source of customers should not include the check-in location completely, otherwise it should be excluded as a non-tourist group; ② Users should pay attention to setting the time limit for deduplication by themselves, and pay attention to the tourists in the last row of the similar table who go on vacation The situation of staying in the area for a long time.

如图2所示,游记数据来自于权威攻略网站的文本内容及图片链接,包括但不限于以下攻略网站:马蜂窝网、携程网、去哪儿网、穷游网上的文本内容和图片连接。具体获取方式如下:As shown in Figure 2, the travel notes data comes from the text content and image links of authoritative guide websites, including but not limited to the following guide websites: Mafengwo, Ctrip, Qunar, and Qyer.com. The specific way to obtain it is as follows:

S20:使用Python自动提取不同类型的游记,从中随机抽取第一游记样本,对其中每一篇游记文本的时间信息和地点信息进行标记,得到该游记文本的标记旅游时空路径,总结该标记旅游时空路径中的时间信息和地点信息的标记方法,基于该标记方法,形成初始解析模块。S20: Use Python to automatically extract different types of travel notes, randomly select the first sample of travel notes, mark the time information and location information of each travel note text, obtain the marked travel space-time path of the travel note text, and summarize the marked travel space-time The marking method of time information and location information in the route, based on the marking method, forms an initial parsing module.

具体的,要将时间词库和地点词库里面的时间关键词和地点关键词对应每个用户ID按照时间顺序、地点变化进行串联。人工识别每一篇游记文本中的时间关键词,并按照“精确日期”、“精确时间”、“模糊时间”以及“相对时间”来对时间关键词进行标记,并分类放置到相应的时间词库中,并以其中的“精确日期”和“精确时间”作为分割点将游记文本分割成按照时间顺序排序的文本段。需要举例说明的是,“精确日期”的表示形式可以为九月六日、9/6、第2天和/或DAY2;“精确时间”的表示形式可以为十点半、14:30和/或下午四点;“模糊时间”的表示形式可以为上午、下午、傍晚、清晨、早饭、午餐和/或夜景;“相对时间”的表示形式可以为15分钟后、大概走了2小时和/或游玩了1.5小时左右。相对时间可以通过中值法计算出绝对时间。“精确日期”、“精确时间”、“模糊时间”以及“相对时间”的应用方式为:“精确日期”和“精确时间”作为分割点,“模糊时间”可根据上下午估算具体时间,相对时间需采取中值法计算出绝对时间。时间词库的使用方式为匹配、计算,最终目标为得到具体的一个时间点。Specifically, the time keywords and location keywords in the time and location thesaurus should be connected in series corresponding to each user ID in chronological order and location changes. Manually identify the time keywords in each travel text, and mark the time keywords according to "accurate date", "precise time", "fuzzy time" and "relative time", and classify them into corresponding time words library, and use the "precise date" and "precise time" as the segmentation points to divide the travel note text into text segments sorted in chronological order. For example, the expression form of "exact date" can be September 6th, 9/6, the second day and/or DAY2; the expression form of "exact time" can be 10:30, 14:30 and/or or four in the afternoon; "fuzzy time" can be expressed as am, pm, evening, early morning, breakfast, lunch, and/or night view; "relative time" can be expressed as 15 minutes later, about 2 hours away and/or Or play for about 1.5 hours. The relative time can be calculated by the median method to calculate the absolute time. "Precise Date", "Precise Time", "Fuzzy Time" and "Relative Time" are applied as follows: "Precise Date" and "Precise Time" are used as the dividing points, and "Fuzzy Time" can estimate the specific time according to the morning and afternoon. The time needs to use the median method to calculate the absolute time. The usage of the time lexicon is matching and calculation, and the ultimate goal is to obtain a specific time point.

人工识别每一篇游记文本的文本段中的地点关键词,并按照“精确地点”、“模糊地点”以及“关联地点”来对游记文本中的地点关键词进行标记,并将地点关键词放置到地点词库中。地点词库的使用方式为关联、比较,最终目标为将每个地点统一到同一层级,并且与时间关键词进行对应。需要举例说明的是,“精确地点”的表现形式可以为中国香港(港)、福建(福建省、闽)、广州塔景区(小蛮腰)和/或黄果树瀑布(黄果树景区);“模糊地点”的表现形式可以为正门、终点和/或山顶;“关联地点”的表现形式可以为到了某地、前往某地、返回某地、游览了某地、登上了某地,绕到某地、住的是某地、从A地到B地。“精确地点”、“模糊地点”以及“关联地点”的应用方式为关联和比较,即在对应时间描述性语句群中,根据匹配的情况提取出在该时间下的位置,形成时空路径;模糊地点需通过上下文补充具体位置;若有多个位置,则需进一步分析是否需要补充时间或删除位置;若精确地点在上一文本段而当前文本段没有精确地点,可与上一段的精确地点信息合并。Manually identify the location keywords in the text segment of each travel text, and mark the location keywords in the travel text according to "precise location", "fuzzy location" and "associated location", and place the location keywords to the thesaurus of places. The location lexicon is used in the way of association and comparison, and the ultimate goal is to unify each location to the same level and correspond to time keywords. It needs to be illustrated that the expression form of "precise location" can be Hong Kong (Hong Kong), Fujian (Fujian Province, Fujian), Canton Tower Scenic Area (Xiaomanyao) and/or Huangguoshu Waterfall (Huangguoshu Scenic Area); " can be expressed in the main entrance, end point and/or mountain top; "associated places" can be expressed in the form of arriving at a certain place, going to a certain place, returning to a certain place, visiting a certain place, boarding a certain place, going around a certain place , Live in a certain place, from A to B. The application methods of "precise location", "fuzzy location" and "associated location" are association and comparison, that is, in the descriptive sentence group corresponding to the time, the location at that time is extracted according to the matching situation, forming a space-time path; fuzzy The location needs to supplement the specific location through the context; if there are multiple locations, it needs to further analyze whether it needs to add time or delete the location; if the exact location is in the previous text segment and the current text segment does not have an exact location, it can be compared with the precise location information of the previous paragraph merge.

基于时间关键词和地点关键词的提取方法,按照时间顺序书写,地点变化的逻辑来编写用于机器学习的初步解析模块。具体步骤为:Based on the extraction method of time keywords and location keywords, write in chronological order, and write the preliminary analysis module for machine learning based on the logic of location changes. The specific steps are:

F1:设置预览区域,筛选具有时空信息的文本段;F1: Set the preview area to filter text segments with spatio-temporal information;

F2:设置时间词库的文字查询输入框,自动识别所述具有时空信息的文本段中的时间关键词;F2: the text query input box of time lexicon is set, automatically identify the time keywords in the text segment with spatio-temporal information;

F3:设置时间词库的文字查询输入框的输出结果与任务模板中相匹配的查询输入接口链接,将识别的时间关键词对应到已划分好的时间词库中;F3: the output result of the text query input box of the time lexicon is set and the query input interface link matched in the task template, and the time keywords identified are corresponding to the divided time lexicon;

F4:设置输出转换接口与相应的转换程序链接,将录入的每一个用户ID对应的时间信息输出至预设表格中,按照日期到时间的匹配顺序输出,形成以时间为核心的描述性语句群;F4: Set the output conversion interface to link with the corresponding conversion program, output the time information corresponding to each user ID entered into the preset table, and output it in the matching order from date to time, forming a descriptive sentence group with time as the core ;

F5:设置地点词库查询输入框,自动识别所述具有时空信息的文本段中地点关键词;F5: the location thesaurus query input box is set, automatically identify the location keyword in the text segment with spatio-temporal information;

F6:设置地点词库文字查询输入框的输出结果与任务模板中相匹配的查询输入接口链接,将识别的地点关键词对应到已划分好的地点词库中;F6: the output result of the location thesaurus text query input box is set and the query input interface link matched in the task template, and the location keywords identified are corresponding to the divided location thesaurus;

F7:设置输出转换接口与相应的转换程序链接,将录入的每一个用户ID对应不同时间的地点信息输出至所述预设表格中,形成时空路径。F7: set the output conversion interface to link with the corresponding conversion program, and output the place information corresponding to different times of each user ID entered into the preset table to form a space-time path.

S30:获取第二游记样本,运行上述初步解析模块得到第二游记样本的所有游记文本的解析旅游时空路径,基于解析旅游时空路径对初步解析模块进行完善,得到最终解析模块。S30: Obtain the second travel note sample, run the above preliminary analysis module to obtain the analytical travel space-time path of all travel texts of the second travel note sample, improve the preliminary analysis module based on the analysis of travel space-time path, and obtain the final analysis module.

具体的,运行循环滚动方式运行上述初步解析模块,并得到初步解析模块形成的解析旅游时空路径。Specifically, the above-mentioned preliminary analysis module is operated in a cyclic rolling manner, and the analytical travel time-space path formed by the preliminary analysis module is obtained.

优选的,其中若第二游记样本的游记文本中包含旅游照片,解译旅游照片中的时间信息和经纬度信息,基于时间顺序将旅游照片进行排列,生成上述旅游照片的图像时空路径。Preferably, if the travel note text of the second travel note sample contains travel photos, interpret the time information and latitude and longitude information in the travel photos, arrange the travel photos based on time order, and generate the image spatiotemporal path of the above travel photos.

进一步的,在第二游记样本抽取部分游记文本,对上述游记文本进行时间关键词和地点关键词标记,得到标记旅游时空路径,将该游记文本的标记旅游时空路径和由初步解析模块形成的解析旅游时空路径进行对比,基于对比结果中的差异点,计算两者的一致率,对初步解析模块进行完善。Further, part of the travel text is extracted from the second travel sample, and the above-mentioned travel text is tagged with time keywords and location keywords to obtain the marked travel space-time path, and the marked travel space-time path of the travel text is analyzed by the preliminary analysis module. The travel time-space path is compared, based on the differences in the comparison results, the consistency rate of the two is calculated, and the preliminary analysis module is improved.

判断一致率是否达到预设阈值,若一致率未达到预设阈值,继续对初步解析模块进行完善;当一致率达到预设阈值,则固定解析模块,得到最终解析模块。Judging whether the consistency rate reaches the preset threshold, if the consistency rate does not reach the preset threshold, continue to improve the preliminary analysis module; when the consistency rate reaches the preset threshold, fix the analysis module to obtain the final analysis module.

S40:利用最终解析模块应用在预设时间窗口和预设目的地范围的游记样本的游记文本中的解析旅游时空路径,基于上述解析旅游时空路径,构建基于游记的游记时空数据库。S40: Using the final parsing module to apply the analytical travel spatio-temporal path in the travel note text of the travel note sample in the preset time window and preset destination range, based on the above-mentioned parsing of the travel spatio-temporal path, construct a travel note spatio-temporal database based on travel notes.

优选的,收集旅游景区的点评数据,计算单个旅游景区在当地城市全部旅游景区的点评数据比例,基于点评数据比例获得旅游景区的旅游到访人数;Preferably, collect comment data of tourist attractions, calculate the proportion of comment data of a single tourist attraction in all tourist attractions in the local city, and obtain the number of tourists visiting tourist attractions based on the proportion of comment data;

统计签到时空数据库和游记时空数据库的旅游到访人数,并计算与基于点评数据的旅游到访人数的偏差比例。Count the number of tourist visitors in the check-in spatio-temporal database and the travel notes spatio-temporal database, and calculate the deviation ratio from the number of tourist visitors based on the review data.

具体的,查找旅游数据官方网站发布的《统计年鉴》或旅游统计数据,从中获得时间窗口内目标城市的的旅游接待人数;在权威旅行网站收集游客对旅游景区的点评数据,得到每个旅游景区在当地城市全部旅游景区的点评数据比例;将每个旅游景区的点评数据比例与该城市的旅游接待人数相乘,得到单个旅游景区的旅游到访人数。Specifically, look up the "Statistical Yearbook" or tourism statistical data released by the official website of tourism data, and obtain the number of tourist receptions in the target city within the time window; collect tourists' comment data on tourist attractions on authoritative travel websites, and obtain The proportion of comment data of all tourist attractions in the local city; multiply the proportion of comment data of each tourist attraction by the number of tourist receptions in the city to obtain the number of tourists visiting a single tourist attraction.

将同一时间窗口、相同区域范围的签到时空数据库和游记时空数据库中的数据进行汇总,得到基于签到时空数据库和游记时空数据库的旅游到访人数,并与上述通过点评数据得到的旅游到访人数进行对比,计算偏差比例。Summarize the data in the check-in spatio-temporal database and the travel notes spatio-temporal database in the same time window and the same area to obtain the number of tourist visitors based on the check-in spatio-temporal database and the travel notes spatio-temporal database, and compare it with the number of tourist visitors obtained from the above-mentioned review data. Compare and calculate the deviation ratio.

设定通过点评数据得到的旅游到访人数为x1,通过签到时空数据库得到的旅游到访人数为x2,两者偏差比例为

Figure BDA0003894852010000081
若偏差比例y1小于或等于10%,则证明签到时空数据库的数据源达到饱和,若偏差比例y1大于10%,则证明签到时空数据库的数据源未饱和,需要增加签到时空数据库采集的数量,直到偏差比例y2小于或等于10%。Set the number of tourist visitors obtained from the review data as x 1 , and the number of tourist visitors obtained from the check-in spatio-temporal database as x 2 , the deviation ratio between the two is
Figure BDA0003894852010000081
If the deviation ratio y 1 is less than or equal to 10%, it proves that the data source of the check-in spatio-temporal database is saturated; if the deviation ratio y 1 is greater than 10%, it proves that the data source of the check-in spatio-temporal database is not saturated, and the number of check-in spatio-temporal databases needs to be increased , until the deviation ratio y2 is less than or equal to 10%.

设定通过游记时空数据库得到的旅游到访人数为x3,则与通过点评数据得到的旅游到访人数为x1之间的偏差比例为

Figure BDA0003894852010000082
若偏差比例y2小于或等于10%,则证明游记时空数据库的数据源达到饱和,若偏差比例y2大于10%,则证明游记时空数据库的数据源未达到饱和,需增加游记时空数据库的数据源数量,直到偏差比例y2小于或等于10%。生成可视化分析之前,对数据库进行饱和性和准确性评估,确定个体数据的可靠性。Assuming that the number of tourist visitors obtained through the spatio-temporal database of travel notes is x 3 , then the deviation ratio between the number of tourist visitors obtained through the review data is x 1 is
Figure BDA0003894852010000082
If the deviation ratio y 2 is less than or equal to 10%, it proves that the data source of the travel notes spatio-temporal database is saturated; if the deviation ratio y 2 is greater than 10%, it proves that the data source of the travel notes spatio-temporal database is not saturated, and the data of the travel notes spatio-temporal database needs to be increased The number of sources until the deviation ratio y2 is less than or equal to 10%. Perform a saturation and accuracy assessment of the database to determine the reliability of individual data before generating visual analytics.

同时还可以通过对比标记旅游时空路径和初步解析模块形成的解析旅游时空路径中的时间偏差和地点偏差来判断两者的一致性。At the same time, the consistency of the two can be judged by comparing the time deviation and location deviation in the analytic travel time-space path formed by the marked travel time-space path and the preliminary analysis module.

通过数据饱和与准确性评估后,可得到指定时间窗口、指定区域范围的游客时空行为结构化数据库,该数据库包含用户ID、到访时间、到访景区地点、游览顺序等。After data saturation and accuracy evaluation, a structured database of tourist spatio-temporal behavior in a specified time window and specified area can be obtained.

S50:基于签到时空数据库和游记时空数据库构建游客流动行为数据库,基于游客流动行为数据库得到可视化的游客时空行为路径图。S50: Construct a tourist flow behavior database based on the check-in spatio-temporal database and the travel notes spatio-temporal database, and obtain a visualized tourist spatio-temporal behavior path map based on the tourist flow behavior database.

具体的,采用签到时空数据库和游记时空数据库作为基础,通过ArcGIS软件中“tracking analyst tools”下的“track intervals into lines”工具实现点连成线,形成单个或多个游客的路径图;Specifically, using the check-in spatio-temporal database and the travel notes spatio-temporal database as the basis, through the "track intervals into lines" tool under the "tracking analyst tools" in the ArcGIS software, the points are connected into lines to form a route map of single or multiple tourists;

配套显示时间轴、地图底图,最终形成可视化的游客时空行为路径图以便后续的专业化分析。The time axis and map base map are displayed together, and finally a visualized tourist spatiotemporal behavior path map is formed for subsequent professional analysis.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)、DRAM(SLDRAM)、存储器总线(Rambus) 直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM) 等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented through computer programs to instruct related hardware, and the computer programs can be stored in a non-volatile computer-readable memory In the medium, when the computer program is executed, it may include the processes of the embodiments of the above-mentioned methods. Wherein, any references to memory, storage, database or other media used in the various embodiments provided in the present application may include non-volatile and/or volatile memory. Nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Chain Synchlink, DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将本申请所述系统的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。Those skilled in the art can clearly understand that for the convenience and brevity of description, only the division of the above-mentioned functional units and modules is used for illustration. In practical applications, the above-mentioned functions can be assigned to different functional units, Module completion means that the internal structure of the system described in this application is divided into different functional units or modules to complete all or part of the functions described above.

Claims (10)

1. A method for extracting guest behavior data, the method comprising:
acquiring tourist attraction sign-in data, and performing structured processing on the tourist attraction sign-in data to obtain a sign-in time-space database based on the tourist attraction sign-in data;
acquiring a first travel note sample from a travel website, marking time information and location information of each travel note text in the first travel note sample to obtain a marked travel space-time path, and forming a preliminary analysis module based on a marking method for marking the travel space-time path;
acquiring a second travel note sample, operating the primary analysis module to obtain analysis travel space-time paths of all travel note texts of the second travel note sample, and perfecting the primary analysis module based on the analysis travel space-time paths to obtain a final analysis module;
applying the final analysis module to travel record samples of a preset time window and a preset destination range to obtain a travel record time-space database based on travel records;
and constructing a tourist movement behavior database based on the check-in time space database and the travel time space database, and obtaining a visualized tourist time space behavior path diagram based on the tourist movement behavior database.
2. The method for extracting tourist behavior data according to claim 1, wherein the step of constructing the tourist movement behavior database based on the check-in time space database and the tourist timing time space database, and obtaining the visualized tourist time space behavior path map based on the tourist movement behavior database further comprises:
collecting comment data of tourist attractions, calculating the comment data proportion of all tourist attractions of a local city in a single tourist attraction, and acquiring the reference number of tourist visions of the tourist attractions based on the comment data proportion;
and obtaining a first number of tourism visiting persons based on the check-in time-space database, obtaining a second number of tourism visiting persons based on the travel time-space database, and respectively calculating the deviation proportion of the first number of tourism visiting persons, the second number of tourism visiting persons and the reference number of tourism visiting persons.
3. The method as claimed in claim 2, wherein after calculating the deviation ratio between the first and second numbers of visitors and the reference number of visitors respectively, the method further comprises:
comparing the deviation proportion with a preset deviation proportion, and if the deviation proportion exceeds the preset deviation proportion, further perfecting the sign-on time-space database and the travel time-space database;
and if the deviation ratio is within a preset deviation ratio, obtaining a tourist flow behavior database on the basis of the check-in time space database and the tour record time space database.
4. The method as claimed in claim 1, wherein the step of obtaining the tourist attraction check-in data and performing the structuring process on the tourist attraction check-in data to obtain the check-in space-time database based on the tourist attraction check-in data comprises:
acquiring a scenic spot sign-in ID of each scenic spot name according to the scenic spot names in the target area to obtain a list of the scenic spot names, the scenic spot sign-in IDs and scenic spot numbers;
acquiring sign-in data of user IDs corresponding to all the scenic spot sign-in IDs of the scenic spots in a time window to obtain an initial user database of the scenic spots, wherein the initial user database comprises the user IDs, sign-in time, sign-in places and sign-in contents;
and acquiring personal information of all user IDs in the initial user database, supplementing the initial user database as an attached table, and acquiring a check-in time space database of the check-in data of the tourist attraction.
5. The guest behavior data extraction method of claim 4, wherein the step of obtaining the guest check-in ID for each guest name further comprises:
and establishing a master check-in ID and a plurality of slave check-in IDs of the scenic spots, summarizing the plurality of slave check-in IDs to the master check-in ID, and using the master check-in ID as the scenic spot check-in ID of each scenic spot.
6. The method according to claim 4 or 5, wherein the step of acquiring check-in data of user IDs corresponding to check-in IDs of all the scenic spots within the time window further comprises:
if the same user ID corresponds to a plurality of different scenic spot sign-in IDs, associating the user ID with the plurality of different scenic spot sign-in IDs;
and if the same user ID corresponds to a plurality of same scenic spot sign-in IDs, performing duplicate removal processing on the same scenic spot sign-in IDs.
7. The method as claimed in claim 1, wherein the step of obtaining a first travel record sample from a travel website, marking time information and location information of each travel record text in the first travel record sample to obtain a marked travel spatiotemporal path, and forming a preliminary analysis module based on the marking method of the marked travel spatiotemporal path comprises:
marking a time keyword of each travel note text in the first travel note sample based on a precise date, a precise time, a fuzzy time and a relative time, placing the time keyword into a time word bank, and dividing the travel note text into text segments by taking the precise date and the precise time as dividing points;
identifying a place keyword for the text segment based on the precise place, the fuzzy place, and the associated place, and placing the place keyword into a place thesaurus;
and constructing a preliminary analysis module based on the time keyword and the place keyword extraction method.
8. The method for extracting tourist behavior data according to claim 1, wherein the step of obtaining a second travel note sample, sequentially operating the preliminary analysis module to obtain analysis travel space-time paths of all travel note texts of the second travel note sample, and completing the preliminary analysis module based on the analysis travel space-time paths to obtain a final analysis module comprises:
and if the second travel record sample contains the travel photos, extracting time information and longitude and latitude information in the travel photos, arranging the travel photos based on a time sequence, and generating an image space-time path of the travel photos.
9. The method as claimed in claim 1, wherein the step of refining the preliminary analysis module based on the analysis of the travel spatiotemporal path to obtain the final analysis module comprises:
and extracting part of travel note texts from the second travel note samples, marking the travel note texts to obtain marked travel space-time paths, comparing the analyzed travel space-time paths obtained by the preliminary analysis module with the marked travel space-time paths, and perfecting the preliminary analysis module based on a comparison result to obtain a final analysis module.
10. The method of claim 1, wherein the step of obtaining a visualized tourist spatiotemporal behavior path map based on the tourist flow behavior database comprises:
and forming a path map of single or multiple tourists through an ArcGIS software tool based on the tourist flow behavior database, and adding a time axis and a map base map into the path map to form a visualized tourist space-time behavior path map.
CN202211270201.4A 2022-10-18 2022-10-18 Tourist behavior data extraction method Active CN115577190B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211270201.4A CN115577190B (en) 2022-10-18 2022-10-18 Tourist behavior data extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211270201.4A CN115577190B (en) 2022-10-18 2022-10-18 Tourist behavior data extraction method

Publications (2)

Publication Number Publication Date
CN115577190A true CN115577190A (en) 2023-01-06
CN115577190B CN115577190B (en) 2023-05-30

Family

ID=84585619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211270201.4A Active CN115577190B (en) 2022-10-18 2022-10-18 Tourist behavior data extraction method

Country Status (1)

Country Link
CN (1) CN115577190B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116821692A (en) * 2023-08-28 2023-09-29 北京化工大学 Method, device and storage medium for constructing descriptive text and space scene sample set

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007241903A (en) * 2006-03-10 2007-09-20 Nagasaki Prefecture Dynamic recording method for tourists
US20120084000A1 (en) * 2010-10-01 2012-04-05 Microsoft Corporation Travel Route Planning Using Geo-Tagged Photographs
CN105550951A (en) * 2015-12-30 2016-05-04 南京邮电大学 Decision assistant system and method of tour travel
WO2016132189A1 (en) * 2015-02-21 2016-08-25 Malekzadeh Mohammadsharif Method for tourism management and quality control
CN106021618A (en) * 2016-07-13 2016-10-12 桂林电子科技大学 System and method for inquiring and managing touring information of scenic spot
CN109086919A (en) * 2018-07-17 2018-12-25 新华三云计算技术有限公司 A kind of sight spot route planning method, device, system and electronic equipment
JP2019023851A (en) * 2017-07-21 2019-02-14 株式会社エヌ・ティ・ティ・アド Data analysis system and data analysis method
CN110544115A (en) * 2019-08-16 2019-12-06 北京慧辰资道资讯股份有限公司 Method and device for analyzing characteristics of tourists from scenic spot tourism big data
CN113609842A (en) * 2021-08-17 2021-11-05 四川轻化工大学 A method for obtaining scenic review data and travel experience evaluation
CN113742481A (en) * 2021-07-14 2021-12-03 安徽师范大学 Research method of spatial and temporal change characteristics of tourism flow emotion based on social media big data
CN115330221A (en) * 2022-08-18 2022-11-11 湖州师范学院 A system and method for data analysis and feedback of rural tourism information

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007241903A (en) * 2006-03-10 2007-09-20 Nagasaki Prefecture Dynamic recording method for tourists
US20120084000A1 (en) * 2010-10-01 2012-04-05 Microsoft Corporation Travel Route Planning Using Geo-Tagged Photographs
WO2016132189A1 (en) * 2015-02-21 2016-08-25 Malekzadeh Mohammadsharif Method for tourism management and quality control
CN105550951A (en) * 2015-12-30 2016-05-04 南京邮电大学 Decision assistant system and method of tour travel
CN106021618A (en) * 2016-07-13 2016-10-12 桂林电子科技大学 System and method for inquiring and managing touring information of scenic spot
JP2019023851A (en) * 2017-07-21 2019-02-14 株式会社エヌ・ティ・ティ・アド Data analysis system and data analysis method
CN109086919A (en) * 2018-07-17 2018-12-25 新华三云计算技术有限公司 A kind of sight spot route planning method, device, system and electronic equipment
CN110544115A (en) * 2019-08-16 2019-12-06 北京慧辰资道资讯股份有限公司 Method and device for analyzing characteristics of tourists from scenic spot tourism big data
CN113742481A (en) * 2021-07-14 2021-12-03 安徽师范大学 Research method of spatial and temporal change characteristics of tourism flow emotion based on social media big data
CN113609842A (en) * 2021-08-17 2021-11-05 四川轻化工大学 A method for obtaining scenic review data and travel experience evaluation
CN115330221A (en) * 2022-08-18 2022-11-11 湖州师范学院 A system and method for data analysis and feedback of rural tourism information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
邵隽;常雪松;赵雅敏;: "基于游记大数据的华山景区游客行为模式研究", 中国园林 *
陈子微;姚建盛;: "基于旅游数字足迹的游客时空行为研究――以南京市玄武区为例" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116821692A (en) * 2023-08-28 2023-09-29 北京化工大学 Method, device and storage medium for constructing descriptive text and space scene sample set

Also Published As

Publication number Publication date
CN115577190B (en) 2023-05-30

Similar Documents

Publication Publication Date Title
Chen et al. Using social media images as data in social science research
Smetanin The applications of sentiment analysis for Russian language texts: Current challenges and future perspectives
Önder Forecasting tourism demand with Google trends: Accuracy comparison of countries versus cities
Studer et al. Estimating the relationship between time-varying covariates and trajectories: The sequence analysis multistate model procedure
Costanza et al. Influential publications in ecological economics revisited
CN110059177B (en) Activity recommendation method and device based on user portrait
Lemmerich et al. Mining subgroups with exceptional transition behavior
Liu et al. Mining urban perceptions from social media data
Camacho et al. Sentiment mapping: point pattern analysis of sentiment classified Twitter data
CN110309432B (en) Synonym determining method based on interest points and map interest point processing method
Stephany et al. An exploration of wikipedia data as a measure of regional knowledge distribution
Garg et al. Impact of social media sentiments in stock market predictions: A bibliometric analysis
Alsudais Quantifying the offline interactions between hosts and guests of Airbnb
Rahal et al. The rating dilemma of academic management journals: Attuning the perceptions of peer rating
CN115577190B (en) Tourist behavior data extraction method
Bizzoni et al. Predicting Literary Quality How Perspectivist Should We Be?
CN111325235B (en) Computation method and application of semantic similarity of common place names for multilingual
Schürer et al. Standardising and coding birthplace strings and occupational titles in the British censuses of 1851 to 1911
CN103823868A (en) Event recognition method and event relation extraction method oriented to on-line encyclopedia
US10504145B2 (en) Automated classification of network-accessible content based on events
Cai et al. Discovering common semantic trajectories from geo-tagged social media
Gregory et al. Modeling space in historical texts
CN111753538B (en) Method and device for extracting key elements of divorce dispute judge
Sergeeva et al. SemAGR: semantic method for accurate geolocations reconstruction within extensive urban sites
CN115952216A (en) Aging insurance data mining method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant