WO2015074477A1 - 路径分析方法和装置 - Google Patents

路径分析方法和装置 Download PDF

Info

Publication number
WO2015074477A1
WO2015074477A1 PCT/CN2014/089936 CN2014089936W WO2015074477A1 WO 2015074477 A1 WO2015074477 A1 WO 2015074477A1 CN 2014089936 W CN2014089936 W CN 2014089936W WO 2015074477 A1 WO2015074477 A1 WO 2015074477A1
Authority
WO
WIPO (PCT)
Prior art keywords
path
entry
analysis
accessed
user
Prior art date
Application number
PCT/CN2014/089936
Other languages
English (en)
French (fr)
Inventor
洪超
杨基彬
Original Assignee
北京国双科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京国双科技有限公司 filed Critical 北京国双科技有限公司
Priority to US15/037,783 priority Critical patent/US20160299903A1/en
Publication of WO2015074477A1 publication Critical patent/WO2015074477A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Definitions

  • the present invention relates to the field of data analysis, and in particular to a path analysis method and apparatus.
  • On-Line Analytical Processing is a fast software technology for online data access and analysis of specific problems that share multi-dimensional information. It provides fast, consistent, and interactive access to a variety of possible forms of observation of information, allowing management decision makers to make in-depth observations of the data.
  • Decision data is multidimensional data
  • multidimensional data is the main content of decision making.
  • OLAP focuses on decision support for decision makers and senior management. It can quickly and flexibly perform complex query processing of large data volumes according to the analyst's requirements, and is intuitive and easy to understand.
  • the form provides the results of the query to the decision makers so that they can accurately grasp the business status of the company (company), understand the needs of the object, and formulate the correct plan.
  • Path navigation The path is the path chain of the user accessing the website. If you access the B page from the A page and then return to the A page, and then leave, the path is A->B->A. Path navigation is to display the user's access path through the interface.
  • path navigation mainly includes:
  • Pre-page analysis Select a Uniform Resource Locator (URL) path to view the distribution of the previous page of all users accessing this page.
  • URL Uniform Resource Locator
  • Post-page analysis Select a URL path to view the distribution of the next page of all users accessing this page.
  • the implementation of Olap is not adopted, but the front and back pages of a specific URL are obtained by querying the data warehouse, and the indicators of related pages (such as the number of visits, the length of stay, etc.) are performed.
  • the method of statistical analysis is not adopted, but the front and back pages of a specific URL are obtained by querying the data warehouse, and the indicators of related pages (such as the number of visits, the length of stay, etc.) are performed. The method of statistical analysis.
  • multi-level profiling that is, the distribution of the post-posting page of a specific post page of a certain page is analyzed, the table connection operation is required, and how many times need to be analyzed, how many times is needed Table join operation.
  • the present invention provides a path analysis method and apparatus to at least solve the above problems in the related art.
  • a path analysis method including: establishing an access table, wherein each entry in the access table stores a plurality of paths accessed by one user according to an order of user access paths Information; finding a first entry in the access table, wherein the first entry is an entry containing a predetermined path; and according to the first entry, performing path analysis related to the predetermined path.
  • the establishing the access table comprises: acquiring an original access table saved in a data warehouse, wherein each entry in the original access table stores information of a path accessed by a user; according to the original access table , establishing the access table.
  • performing the path analysis related to the predetermined path according to the first entry comprises: determining a user in the entry Information of the path accessed prior to accessing the predetermined path; determining the distribution of the path accessed by the user prior to accessing the predetermined path based on information of the path accessed by the user prior to accessing the predetermined path.
  • performing the path-related path analysis according to the first entry comprises: determining that the user in the entry is accessing the predetermined path The information of the N paths accessed before; determining the distribution of the N paths accessed by the user before accessing the predetermined path according to the information of the N paths accessed by the user before accessing the predetermined path, where N is A positive integer.
  • performing the path analysis related to the predetermined path according to the first entry comprises: determining a user in the entry Information of the accessed path after accessing the predetermined path; determining a distribution of the accessed path after the user accesses the predetermined path according to information of the accessed path after the user accesses the predetermined path.
  • performing the path-related path analysis according to the first entry comprises: determining that the user is accessing the predetermined path in the entry Information of the accessed M paths; determining the distribution of the accessed M paths after the user accesses the predetermined path according to the information of the accessed M paths after the user accesses the predetermined path, where , M is a positive integer.
  • a path analyzing apparatus comprising: an establishing module, configured to establish an access table, wherein each entry in the access table is saved in an order saved according to a user access path Information of multiple paths accessed by a user; a search module for searching for a first entry in the access table, wherein the first entry is an entry containing a predetermined path; an analysis module, configured to The first entry performs path analysis related to the predetermined path.
  • the establishing module includes: an obtaining unit, configured to acquire an original access table saved in the data warehouse, wherein each entry in the original access table stores information of a path accessed by the user; And configured to establish the access table according to the original access table.
  • the analysis module includes: a first determining unit, configured to determine information of a path accessed by the user in the first entry before accessing the predetermined path; and a second determining unit, configured to access according to the user
  • the information of the path accessed before the predetermined path determines the distribution of the path accessed by the user before accessing the predetermined path.
  • the analysis module includes: a third determining unit, configured to determine information of the accessed path of the user in the first entry after accessing the predetermined path; and a fourth determining unit, configured to be in accordance with the user
  • the information of the accessed path after accessing the predetermined path determines the distribution of the accessed path after the user accesses the predetermined path.
  • a path analysis system comprising: a data warehouse and a path analysis device, wherein the data warehouse is configured to establish an access table, wherein each entry in the access table And storing, in the order of the user access path, information of a plurality of paths accessed by a user; the path analyzing means, configured to search for a first entry in the access table, wherein the first entry is An entry containing a predetermined path is included, and a path analysis related to the predetermined path is performed according to the first entry.
  • an access table is established, wherein each entry in the access table stores information of a plurality of paths accessed by a user stored in an order according to a user access path; and an entry in the access table including a predetermined path is searched;
  • the entry performs the path analysis related to the predetermined path, and solves the problem of low execution efficiency caused by the path analysis by self-joining the path access table in the data warehouse in the related art, thereby improving the efficiency of the path analysis.
  • FIG. 1 is a schematic flow chart of a path analysis method according to an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of a path analysis apparatus according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a path analysis system according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of path navigation results in accordance with a preferred embodiment of the present invention.
  • Figure 5 is a schematic illustration of the operation of a path analysis system in accordance with a preferred embodiment of the present invention.
  • FIG. 6 is a schematic diagram of the results of path navigation analysis in accordance with a preferred embodiment of the present invention.
  • FIG. 1 is a schematic flowchart of a path analysis method according to an embodiment of the present invention. As shown in FIG. 1, the process includes the following steps:
  • Step S102 establishing an access table, wherein each entry in the access table stores information of a plurality of paths accessed by one user stored in the order of the user access path.
  • Step S104 searching for an entry in the access table that includes a predetermined path.
  • the predetermined path is a path that needs to perform path analysis, and may be preset according to the needs of the path analysis.
  • Step S106 performing path analysis related to the predetermined path according to the entry.
  • the access table established in step S102 is generated by the original access table saved in the data warehouse, and the access table may be generated by the data warehouse or other device, and the time for generating the access table may be selected during the idle period of the system. Within the time, and to ensure the timeliness of the path data update is the minimum requirement.
  • the original access table saved in the data warehouse is obtained, and an access table is established according to the original access table, wherein each entry in the original access table stores information of a path accessed by the user.
  • the analysis efficiency at the time of path analysis is improved by shifting the processing time to the system idle time.
  • the information of the path accessed by the user before accessing the predetermined path is first determined;
  • the determined path information determines the distribution of the path accessed by the user before accessing the predetermined path, for example, the overall distribution of the number of times the page is browsed, the distribution of the number of times of browsing of the page according to the time rule, the distribution of the browsing time of the page, and the page
  • the browsing time is distributed according to the regularity of time.
  • the pre-analysis is N-level pre-analysis
  • information of N paths accessed by the user before accessing the predetermined path is first determined; and then determined according to the determined
  • the information of the N paths determines the distribution of the N paths accessed by the user before accessing the predetermined path, where N is a positive integer.
  • the path analysis related to the predetermined path is the post analysis of the predetermined path
  • the information of the accessed path after the user accesses the predetermined path in the entry is first determined; Based on the determined path information, the distribution of the accessed path after the user accesses the predetermined path is determined.
  • the post-analysis is M-level post-analysis
  • the information of the M paths accessed by the user after accessing the predetermined path is first determined;
  • the information of the M paths determines the distribution of the accessed M paths after the user accesses the predetermined path, where M is a positive integer.
  • the present embodiment further provides a path analysis device, which is used to implement the above-mentioned path analysis method.
  • a path analysis device which is used to implement the above-mentioned path analysis method.
  • the apparatus includes: an establishing module 22, a searching module 24, and an analyzing module 26, wherein the establishing module 22 is configured to establish an access table.
  • each entry in the access table stores information of a plurality of paths accessed by one user according to the order of the user access path; the searching module 24 may be coupled to the establishing module 22 for searching for the predetermined path in the access table.
  • An entry; analysis module 26 can be coupled to the lookup module 24 for performing path analysis associated with the predetermined path based on the entry.
  • the modules and units involved in the embodiments of the present invention may be implemented by software or by hardware.
  • the described modules and units in this embodiment may also be disposed in a processor.
  • a processor may include a setup module 22, a lookup module 24, and an analysis module 26.
  • the names of these modules do not constitute a limitation on the module itself under certain circumstances.
  • the setup module can also be described as "a module for establishing an access table.”
  • the establishing module 22 includes: an obtaining unit 222 coupled to the data warehouse, configured to acquire an original access table saved in the data warehouse, wherein each entry in the original access table stores information of a path accessed by the user;
  • the establishing unit 224 is coupled to the obtaining unit 222 for establishing an access table according to the original access table.
  • the analysis module 26 comprises: a first determining unit 262 for determining information of a path accessed by the user before accessing the predetermined path; the second determining unit 264 can be coupled to the first determining unit 262 for The information of the path accessed before accessing the predetermined path determines the distribution of the path accessed by the user before accessing the predetermined path.
  • the first determining unit 262 is further configured to determine information of the N paths accessed by the user before accessing the predetermined path in the entry; the second determining unit is further configured to: according to the N accessed by the user before accessing the predetermined path The information of the path determines the distribution of the N paths accessed by the user before accessing the predetermined path.
  • the analysis module 26 comprises: a third determining unit 266 for determining information of the accessed path of the user after accessing the predetermined path in the entry; the fourth determining unit 268 is coupled to the third determining unit 266 for The distribution of the accessed path after the user accesses the predetermined path is determined based on the information of the accessed path after the user accesses the predetermined path.
  • the third determining unit 266 is further configured to determine information of the accessed M paths after the user accesses the predetermined path in the entry; the fourth determining unit 268 is further configured to use the accessed M after the user accesses the predetermined path.
  • the information of the paths determines the distribution of the accessed M paths after the user accesses the predetermined path.
  • the embodiment also provides a path analysis system, which is used to implement the above-mentioned path analysis method.
  • the function implementation in the embodiment of the system has been described in detail in the foregoing method embodiment. The description will be made in conjunction with the above description, and will not be described again here.
  • FIG. 3 is a schematic structural diagram of a path analysis system according to an embodiment of the present invention.
  • the system includes: a data warehouse 32 and a path analysis device 34, wherein the data warehouse 32 is configured to establish an access table, where Each entry in the table stores information of a plurality of paths accessed by one user stored in the order of the user access path; the path analyzing means is configured to search for an entry in the access table that includes the predetermined path, and perform a predetermined path according to the entry Related path analysis.
  • the preferred embodiment provides an Olap efficient path navigation analysis solution to solve the problem that the related technology cannot perform analysis and query in Olap, and can only perform analysis and query in the data warehouse, and the performance is relatively low, and the page table is required for each navigation.
  • the problem of self-joining operations The Olap efficient path navigation analysis device provided in the preferred embodiment has high performance because there is no self-joining operation of the table.
  • the N-level efficient mode is adopted, and N refers to any positive integer. If it is 1, it can be degraded into a traditional implementation manner. This setting is to avoid a table self-join query similar to the above traditional manner in Olap. Operation, in exchange for storage time through the storage space.
  • Step S11 establishing an access table in the data warehouse, including a column VisitorKey (visitor unique identifier), a SessionID (session unique identifier), a Page1Key (the first path on the path chain), a Page2Key, ..., a PageNKey, such a row record Indicates the user's primary access path, and its subsequent extended N columns represent the subsequent N paths of the path;
  • Step S12 defining an exit default value for each PageKey, the default value identifying that the user leaves the website;
  • Step S13 assigning values for each page2Key to the path column of the PageNKey, forming subsequent N path information starting from each path point, and setting it to a defined default value if subsequently leaving;
  • Step S14 adding a Page1Key to the N page dimensions of the PageNKey in the design of the Olap, respectively, and respectively associated with the Page1Key to the PageNKey of the access path table;
  • step S15 after the above settings, the following analysis can be conveniently performed:
  • Post-analysis Find the distribution of Page2Key of Page1Key as the next page path of a specific page
  • Multi-level pre-analysis In the N-level, the extended NN to Page1Key can directly parse the pre-N level without the need for table join. When the N-level pre-path analysis degenerates into a table connection in the traditional implementation. the way;
  • Multi-level post-analysis In the N-level, the extended Page1Key to PageNKey can directly parse the post-N level without the need for table joins. When the N-level post-path analysis degenerates into a table connection in the traditional implementation. the way.
  • any page can be selected to view its pre- and post-pages, that is, where to go, where to go;
  • Anatomy i.e., multi-level pre- or multi-stage post-analysis refers to where to go and where to go, etc.
  • the scheme of the preferred embodiment can support N-level profiling or infinite-level profiling.
  • FIG. 5 is a schematic diagram of the operation of a path analysis system according to a preferred embodiment of the present invention.
  • path navigation is performed by a data warehouse device, an OLAP device, and a query device. analysis.
  • the data warehouse creation table includes a column VisitorKey (visitor unique identifier), a SessionID (session unique identifier), a Page1Key (the first path on the path chain), a Page2Key, ..., a PageNKey, for example:
  • VisitorKey SessionID PageKey AccessOrder Vid1 Sid1 P1 1
  • Vid1 Sid1 P2 Vid1 Sid1 P1 3
  • Page1Key is added to the page dimensions of PageNKey, which are respectively associated with the page keys of Page1Key to PageNKey of the access path table, wherein the corresponding Page dimensions respectively pass their corresponding PageXKey (X represents 1 to N) and indicators.
  • the post-analysis is taken as an example, and the pre-analysis and multi-level analysis can be explained with reference to the present example.
  • FIG. 6 is a schematic diagram of the result of path navigation analysis according to a preferred embodiment of the present invention.
  • the member dimension titles are the same here.
  • the previous Title represents the Title corresponding to Page1Key, and the latter represents the Title corresponding to Page2Key, so that we can clearly see all subsequent pages arriving from one of the pages, and their corresponding other indicators (for example, access) the amount).
  • the data warehouse uses the derived N columns to represent the subsequent paths of each path, and avoids the self-connection of the table when the N-level path is navigated or the many-to-many association operation in the Olap. In this way, the performance is improved.
  • the same dimension is added multiple times, and Page1 to PageN are respectively associated with the PageKey of the corresponding table of the data warehouse.
  • Multi-level (M-level, range from 1 to N) pre-analysis you only need to query PageM until Page2 meets the post-condition of Page1 on the selected path.
  • Multi-level (M level, range from 1 to N) Post-position you only need to query Page1 up to PageM-1 to meet certain conditions of PageM.
  • the above analysis process can be obtained only by one query, and this time the query will only have one input and output (IO), and there will be many-to-many operations like data warehouse table connection, thus improving the execution efficiency.
  • modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device, such that they may be stored in a storage device by a computing device, or they may be fabricated into individual integrated circuit modules, or Multiple modules or steps are made into a single integrated circuit module. Thus, the invention is not limited to any specific combination of hardware and software.

Abstract

本发明公开了一种路径分析方法和装置,其中,该方法包括:建立访问表,其中,访问表中的每个条目中保存有根据用户访问路径的顺序保存的一个用户访问的多个路径的信息;查找访问表中包含预定路径的条目;根据该条目,进行预定路径相关的路径分析。通过本发明,解决了相关技术中通过对数据仓库中的路径访问表进行自连接来进行路径分析所导致的执行效率低的问题,提高了路径分析的效率。

Description

路径分析方法和装置 技术领域
本发明涉及数据分析领域,具体而言,涉及一种路径分析方法和装置。
背景技术
联机分析处理(On-Line Analytical Processing,简称为OLAP)是共享多维信息的、针对特定问题的联机数据访问和分析的快速软件技术。它通过对信息的多种可能的观察形式进行快速、稳定一致和交互性的存取,允许管理决策人员对数据进行深入观察。决策数据是多维数据,多维数据就是决策的主要内容。OLAP专门设计用于支持复杂的分析操作,侧重对决策人员和高层管理人员的决策支持,可以根据分析人员的要求快速、灵活地进行大数据量的复杂查询处理,并且以一种直观而易懂的形式将查询结果提供给决策人员,以便他们准确掌握企业(公司)的经营状况,了解对象的需求,制定正确的方案。有关OLAP的相关资料可以进一步参见百度百科:http://baike.baidu.com/view/22068.htm?fromId=57810中的描述,在此不再赘述。
路径导航:路径是用户访问网站的路径链,如从A页面再访问B页面再回到A页面,再离开,则表示路径为A->B->A。路径导航即是通过界面来展示用户的访问路径。
在Olap里,路径导航主要包括:
前置页面分析:选定某一个统一资源定位符(Uniform Resource Locator,简称为URL)路径,查看访问本页面的所有用户前一个页面的分布情况;
后置页面分析:选定某一个URL路径,查看访问本页面的所有用户后一个页面的分布情况。
然而,在相关技术中,并没有采用Olap的实现方式,而是采用了通过查询数据仓库来获得特定URL的前置和后置页面,并进行相关页面的指标(例如访问次数、停留时长等)的统计分析的方法。
相关技术中基于传统数据仓库的路径导航分析方法采用了如下的技术方案:
建立一张路径访问表,在路径访问表中,包含:列,用户名(VisitorKey),会话ID(SessionID),当前访问页面(PageKey),下一个访问页面(NextPageKey);
以相关技术中的前置页面分析方法为例:根据PageKey=选定页面,通过NextPageKey去查找下一个页面的分布情况。
采用上述的技术方案,若是要多级剖析,即分析以某一页面其特定后置页面的再后置页面分布情况,则需要进行表连接操作,并且,需要剖析多少级,便需要多少次的表连接操作。发明人在研究过程中发现,在剖析多级路径时,由于要进行大量的自连接操作,其执行效率将会很慢。
针对相关技术中通过对数据仓库中的路径访问表进行自连接来进行路径分析所导致的执行效率低的问题,目前尚未提出有效的解决方案。
发明内容
本发明提供了一种路径分析方法和装置,以至少解决相关技术中的上述问题。
根据本发明的一个方面,提供了一种路径分析方法,包括:建立访问表,其中,所述访问表中的每个条目中保存有根据用户访问路径的顺序保存的一个用户访问的多个路径的信息;查找所述访问表中的第一条目,其中,所述第一条目为包含预定路径的条目;根据所述第一条目,进行预定路径相关的路径分析。
优选地,建立所述访问表包括:获取数据仓库中保存的原始访问表,其中,所述原始访问表中的每个条目中保存有一个用户访问的一个路径的信息;根据所述原始访问表,建立所述访问表。
优选地,在所述预定路径相关的路径分析为所述预定路径的前置分析的情况下,根据所述第一条目,进行所述预定路径相关的路径分析包括:确定所述条目中用户在访问所述预定路径之前所访问的路径的信息;根据用户在访问所述预定路径之前所访问的路径的信息确定用户在访问所述预定路径之前所访问的路径的分布情况。
优选地,在所述前置分析为N级前置分析的情况下,根据所述第一条目,进行所述预定路径相关的路径分析包括:确定所述条目中用户在访问所述预定路径之前所访问的N个路径的信息;根据用户在访问所述预定路径之前所访问的N个路径的信息确定用户在访问所述预定路径之前所访问的N个路径的分布情况,其中,N为正整数。
优选地,在所述预定路径相关的路径分析为所述预定路径的后置分析的情况下,根据所述第一条目,进行所述预定路径相关的路径分析包括:确定所述条目中用户在访问所述预定路径之后的所访问的路径的信息;根据用户在访问所述预定路径之后的所访问的路径的信息确定用户在访问所述预定路径之后的所访问的路径的分布情况。
优选地,在所述后置分析为M级后置分析的情况下,根据所述第一条目,进行所述预定路径相关的路径分析包括:确定所述条目中用户在访问所述预定路径之后的所访问的M个路径的信息;根据用户在访问所述预定路径之后的所访问的M个路径的信息确定用户在访问所述预定路径之后的所访问的M个路径的分布情况,其中,M为正整数。
根据本发明的另一个方面,还提供了一种路径分析装置,包括:建立模块,用于建立访问表,其中,所述访问表中的每个条目中保存有根据用户访问路径的顺序保存的一个用户访问的多个路径的信息;查找模块,用于查找所述访问表中的第一条目,其中,所述第一条目为包含预定路径的条目;分析模块,用于根据所述第一条目,进行预定路径相关的路径分析。
优选地,所述建立模块包括:获取单元,用于获取数据仓库中保存的原始访问表,其中,所述原始访问表中的每个条目中保存有一个用户访问的一个路径的信息;建立单元,用于根据所述原始访问表,建立所述访问表。
优选地,所述分析模块包括:第一确定单元,用于确定所述第一条目中用户在访问所述预定路径之前所访问的路径的信息;第二确定单元,用于根据用户在访问所述预定路径之前所访问的路径的信息确定用户在访问所述预定路径之前所访问的路径的分布情况。
优选地,所述分析模块包括:第三确定单元,用于确定所述第一条目中用户在访问所述预定路径之后的所访问的路径的信息;第四确定单元,用于根据用户在访问所述预定路径之后的所访问的路径的信息确定用户在访问所述预定路径之后的所访问的路径的分布情况。
根据本发明的另一个方面,还提供了一种路径分析系统,包括:数据仓库和路径分析装置,其中,所述数据仓库,用于建立访问表,其中,所述访问表中的每个条目中保存有根据用户访问路径的顺序保存的一个用户访问的多个路径的信息;所述路径分析装置,用于查找所述访问表中的第一条目,其中,所述第一条目为包含预定路径的条目,并根据所述第一条目,进行预定路径相关的路径分析。
通过本发明,采用建立访问表,其中,访问表中的每个条目中保存有根据用户访问路径的顺序保存的一个用户访问的多个路径的信息;查找访问表中包含预定路径的条目;根据该条目,进行预定路径相关的路径分析的方式,解决了相关技术中通过对数据仓库中的路径访问表进行自连接来进行路径分析所导致的执行效率低的问题,提高了路径分析的效率。
附图说明
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1是根据本发明实施例的路径分析方法的流程示意图;
图2是根据本发明实施例的路径分析装置的结构示意图;
图3是根据本发明实施例的路径分析系统的结构示意图;
图4是根据本发明优选实施例的路径导航结果的示意图;
图5是根据本发明优选实施例的路径分析系统的工作示意图;
图6是根据本发明优选实施例的路径导航分析结果的示意图。
具体实施方式
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本发明。
在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
本实施例提供了一种路径分析方法,图1是根据本发明实施例的路径分析方法的流程示意图,如图1所示,该流程包括如下步骤:
步骤S102,建立访问表,其中,访问表中的每个条目中保存有根据用户访问路径的顺序保存的一个用户访问的多个路径的信息。
步骤S104,查找访问表中包含预定路径的条目。
在本发明实施例中,预定路径是需要进行路径分析的路径,可以根据路径分析的需要进行预先设定。
步骤S106,根据条目,进行预定路径相关的路径分析。
通过上述步骤,通过采用建立每个条目中保存有根据用户访问路径的顺序保存的一个用户访问的多个路径的信息的访问表,从而使得在进行特点路径相关的路径分析的情况下只需要查找建立的访问表中预定路径的条目,而不再需要再进行表的自连接。相对于相关技术中在分析时需要对数据仓库中的数据进行自连接操作导致的执行效率变低,本实施例提供的上述方案解决了通过对数据仓库中的路径访问表进行自连接来进行路径分析所导致的执行效率低的问题,提升了路径分析的效率。
优选地,在步骤S102中建立的访问表是通过数据仓库中保存的原始访问表生成的,并且可以由数据仓库或者其他装置来生成访问表,生成访问表的时间可以选在系统空闲的时间段内,并以保证路径数据更新的及时性为最低要求。例如,获取数据仓库中保存的原始访问表,并根据原始访问表,建立访问表,其中,原始访问表中的每个条目中保存有一个用户访问的一个路径的信息。通过上述处理,通过将处理时间转移到系统空闲时间,从而提高了路径分析时的分析效率。
优选地,在预定路径相关的路径分析为预定路径的前置分析的情况下,在进行预定路径相关的路径分析时,先确定条目中用户在访问预定路径之前所访问的路径的信息;然后根据确定的路径信息,确定用户在访问预定路径之前所访问的路径的分布情况,例如,页面的浏览次数的总体分布情况、页面的浏览次数按照时间规律的分布情况、页面的浏览时长分布情况、页面的浏览时长按照时间规律的分布情况等。
优选地,在前置分析为N级前置分析的情况下,在进行预定路径相关的路径分析时,先确定条目中用户在访问预定路径之前所访问的N个路径的信息;然后根据确定的N个路径的信息,确定用户在访问预定路径之前所访问的N个路径的分布情况,其中,N为正整数。
优选地,在预定路径相关的路径分析为预定路径的后置分析的情况下,在进行预定路径相关的路径分析时,先确定条目中用户在访问预定路径之后的所访问的路径的信息;然后根据确定的路径信息,确定用户在访问预定路径之后的所访问的路径的分布情况。
优选地,在后置分析为M级后置分析的情况下,在进行预定路径相关的路径分析时,先确定条目中用户在访问预定路径之后的所访问的M个路径的信息;然后根据确定的M个路径的信息,确定用户在访问预定路径之后的所访问的M个路径的分布情况,其中,M为正整数。
本实施例还提供了一种路径分析装置,该装置用于实现上述路径分析方法,该装置实施例中的功能实现在上述方法实施例中已经进行过详细的说明,在此将不再赘述。
图2是根据本发明实施例的路径分析装置的结构示意图,如图2所示,该装置包括:建立模块22、查找模块24和分析模块26,其中,建立模块22,用于建立访问表,其中,访问表中的每个条目中保存有根据用户访问路径的顺序保存的一个用户访问的多个路径的信息;查找模块24可以耦合至建立模块22,用于查找访问表中包含预定路径的条目;分析模块26可以耦合至查找模块24,用于根据条目,进行预定路径相关的路径分析。
本发明的实施例中所涉及到的模块、单元可以通过软件的方式实现,也可以通过硬件的方式来实现。本实施例中的所描述的模块、单元也可以设置在处理器中,例如,可以描述为:一种处理器包括建立模块22、查找模块24和分析模块26。其中,这些模块的名称在某种情况下并不构成对该模块本身的限定,例如,建立模块还可以被描述为“用于建立访问表的模块”。
优选地,建立模块22包括:获取单元222耦合至数据仓库,用于获取数据仓库中保存的原始访问表,其中,原始访问表中的每个条目中保存有一个用户访问的一个路径的信息;建立单元224耦合至获取单元222,用于根据原始访问表,建立访问表。
优选地,分析模块26包括:第一确定单元262,用于确定条目中用户在访问预定路径之前所访问的路径的信息;第二确定单元264可以耦合至第一确定单元262,用于根据用户在访问所述预定路径之前所访问的路径的信息确定用户在访问预定路径之前所访问的路径的分布情况。
优选地,第一确定单元262还用于确定条目中用户在访问预定路径之前所访问的N个路径的信息;第二确定单元还用于根据用户在访问所述预定路径之前所访问的N个路径的信息确定用户在访问预定路径之前所访问的N个路径的分布情况。
优选地,分析模块26包括:第三确定单元266,用于确定条目中用户在访问预定路径之后的所访问的路径的信息;第四确定单元268耦合至第三确定单元266,用于 根据用户在访问预定路径之后的所访问的路径的信息确定用户在访问预定路径之后的所访问的路径的分布情况。
优选地,第三确定单元266还用于确定条目中用户在访问预定路径之后的所访问的M个路径的信息;第四确定单元268还用于根据用户在访问预定路径之后的所访问的M个路径的信息确定用户在访问预定路径之后的所访问的M个路径的分布情况。
本实施例还提供了一种路径分析系统,该系统用于实现上述路径分析方法,该系统实施例中的功能实现在上述方法实施例中已经进行过详细的说明,在本系统实施例中可以结合上述描述进行说明,在此将不再赘述。
图3是根据本发明实施例的路径分析系统的结构示意图,如图3所示,该系统包括:数据仓库32和路径分析装置34,其中,数据仓库32,用于建立访问表,其中,访问表中的每个条目中保存有根据用户访问路径的顺序保存的一个用户访问的多个路径的信息;路径分析装置,用于查找访问表中包含预定路径的条目,并根据条目,进行预定路径相关的路径分析。
由上述描述可知:在该系统实施例中,将建立访问表的过程移植到数据仓库中进行处理。可以理解的是,无论是在数据仓库中还是在路径分析装置中进行处理都能够实现本发明的有益效果,均应在本发明的保护范围之内。
下面结合优选实施例进行描述和说明。
本优选实施例提供了一种Olap高效路径导航分析方案,以解决相关技术不能在Olap里进行分析查询,只能在数据仓库里进行分析查询,并且性能比较低,每次导航都需要一次页面表的自连接操作的问题。本优选实施例中提供的Olap高效路径导航分析装置,由于不会有表的自连接操作,从而性能高效。
在本优选实施例中,采用N级高效方式,N指代任意正整数,若为1则可以退化成传统的实现方式,此设置是为了避免在Olap里发生类似上面传统方式的表自连接查询操作,通过存储空间换取查询时间。
本优选实施例包括如下步骤:
步骤S11,在数据仓库里建立一张访问表,包含列VisitorKey(访客唯一标识),SessionID(会话唯一标识),Page1Key(路径链上的第一条路径),Page2Key,…,PageNKey,这样一行记录表示用户的一次访问路径,其后续的扩展N列代表此路径的后续N个路径;
步骤S12,为每个PageKey定义其退出默认值,该默认值标识用户离开网站;
步骤S13,为各page2Key一直到PageNKey的路径列进行赋值,形成从各路径点开始的后续N路径信息,若后续离开的则将其设置为定义的默认值;
步骤S14,Olap里设计时添加Page1Key一直到PageNKey的N个页面维度,分别与访问路径表的Page1Key到PageNKey通过对应的键关联;
步骤S15,通过上面的设置后,便可方便的进行下面的分析:
前置分析:查找Page2Key为特定页面的前一页面路径Page1Key的分布情况;
后置分析:查找Page1Key为特定页面的后一页面路径Page2Key的分布情况;
多级前置分析:在N级内,通过扩展的PageNKey到Page1Key可以直接剖析前置的N级,而不需要进行表连接,当超过N级的前置路径分析退化成传统实现里的表连接方式;
多级后置分析:在N级内,通过扩展的Page1Key到PageNKey可以直接剖析后置的N级,而不需要进行表连接,当超过N级的后置路径分析退化成传统实现里的表连接方式。
下面结合具体的实例对上述优选实施例进行说明。
图4是根据本发明优选实施例的路径导航结果的示意图,如图4所示,选取任一页面,便可以查看其前置和后置页面,即从哪来,到哪去了;多级剖析(即多级前置或者多级后置分析)是指到哪去后又到哪去了等,采用本优选实施例的方案可支持N级剖析或无限级剖析。
例如,图5是根据本发明优选实施例的路径分析系统的工作示意图,如图5所示,在采用本优选实施例的一个方案中,通过数据仓库装置、OLAP装置和查询装置来完成路径导航分析。
其中,在数据仓库装置中:
在上述步骤S11中,数据仓库建立表,包含列VisitorKey(访客唯一标识),SessionID(会话唯一标识),Page1Key(路径链上的第一条路径),Page2Key,…,PageNKey,例如:
获得原始页面路径顺序,假设如下表1所示的p1->p2->p1:
表1
VisitorKey SessionID PageKey AccessOrder
Vid1 Sid1 p1 1
Vid1 Sid1 p2 2
Vid1 Sid1 p1 3
根据源表数据(原始页面路径顺序)里的路径访问顺序分别为各路径的后续n级路径进行赋值,假设离开的默认值为“-”,如表2所示:
表2
VisitorKey SessionID Page1Key Page2Key Page3Key PageNKey
Vid1 Sid1 p1 p2 p1 - -
Vid1 Sid1 p2 p1 - - -
Vid1 Sid1 p1 - - - -
在OLAP装置中:
设计时添加Page1Key一直到PageNKey的n个页面维度,分别与访问路径表的Page1Key到PageNKey通过对应的键关联,其中,对应的各Page维度分别通过其对应的PageXKey(X代表1到N)与指标组关联;
在查询装置中:
在本优选实施例中以后置分析为例,前置分析以及多级分析可以参照本实例进行说明。
分析P2页面的后置页面:
通过对Page1Key=P2的条件过滤数据行,剩下的结果集仅为一行(在另一些实施例中可能为多行,在此仅用最简单的示例进行说明),如下表3所示。再选出所有Page2Key的值即为所有的Page1Key=P2的页面的后置页面,即p1。
表3
Figure PCTCN2014089936-appb-000001
Figure PCTCN2014089936-appb-000002
图6是根据本发明优选实施例的路径导航分析结果的示意图,如图6所示,由于在Olap建模里都使用的是一个页面维度,故这里的成员维度标题(Title)都是一样的,前一个Title代表Page1Key所对应的Title,后一个代表Page2Key所对应的Title,这样我们就能很清楚的看到从其中一个页面到达的所有后续页面,以及它们的对应的其它指标(例如,访问量)。
通过上述描述,在本优选实施例中采用数据仓库里通过派生N列,来代表每个路径的后续路径,避免N级路径导航时发生表的自连接或是Olap里的多对多关联操作,以此提升性能;在Olap里通过多次添加同一维度,Page1到PageN分别与数据仓库相应表的PageKey关联,前置剖析时,仅需查询Page2符合一定条件的Page1即可,后置剖析时,仅需要查询Page1符合一定条件的Page2即可,多级(M级,范围为1到N)前置分析,仅需要查询PageM一直到Page2符合选定路径上的后条件的Page1即可,多级(M级,范围为1到N)后置,仅需要查询Page1一直到PageM-1的路径上符合一定条件的PageM即可。上述的分析过程都仅需要一次查询即可得到,并且这一次查询仅会发生一次输入输出(IO),不会有类似数据仓库表连接的多对多操作,因此提高了执行效率。
显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。
以上上述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (11)

  1. 一种路径分析方法,其特征在于包括:
    建立访问表,其中,所述访问表中的每个条目中保存有根据用户访问路径的顺序保存的一个用户访问的多个路径的信息;
    查找所述访问表中的第一条目,其中,所述第一条目为包含预定路径的条目;
    根据所述第一条目进行所述预定路径相关的路径分析。
  2. 根据权利要求1所述的方法,其特征在于,建立所述访问表包括:
    获取数据仓库中保存的原始访问表,其中,所述原始访问表中的每个条目中保存有一个用户访问的一个路径的信息;
    根据所述原始访问表,建立所述访问表。
  3. 根据权利要求1或2所述的方法,其特征在于,在所述预定路径相关的路径分析为所述预定路径的前置分析的情况下,根据所述第一条目进行所述预定路径相关的路径分析包括:
    确定所述第一条目中用户在访问所述预定路径之前所访问的路径的信息;
    根据用户在访问所述预定路径之前所访问的路径的信息确定用户在访问所述预定路径之前所访问的路径的分布情况。
  4. 根据权利要求3所述的方法,其特征在于,在所述前置分析为N级前置分析的情况下,根据所述第一条目进行所述预定路径相关的路径分析包括:
    确定所述第一条目中用户在访问所述预定路径之前所访问的N个路径的信息;
    根据用户在访问所述预定路径之前所访问的N个路径的信息确定用户在访问所述预定路径之前所访问的N个路径的分布情况,其中,N为正整数。
  5. 根据权利要求1或2所述的方法,其特征在于,在所述预定路径相关的路径分析为所述预定路径的后置分析的情况下,根据所述第一条目进行所述预定路径相关的路径分析包括:
    确定所述第一条目中用户在访问所述预定路径之后的所访问的路径的信息;
    根据用户在访问所述预定路径之后的所访问的路径的信息确定用户在访问所述预定路径之后的所访问的路径的分布情况。
  6. 根据权利要求5所述的方法,其特征在于,在所述后置分析为M级后置分析的情况下,根据所述第一条目进行所述预定路径相关的路径分析包括:
    确定所述第一条目中用户在访问所述预定路径之后的所访问的M个路径的信息;
    根据用户在访问所述预定路径之后的所访问的M个路径的信息确定用户在访问所述预定路径之后的所访问的M个路径的分布情况,其中,M为正整数。
  7. 一种路径分析装置,其特征在于包括:
    建立模块,用于建立访问表,其中,所述访问表中的每个条目中保存有根据用户访问路径的顺序保存的一个用户访问的多个路径的信息;
    查找模块,用于查找所述访问表中的第一条目,其中,所述第一条目为包含预定路径的条目;
    分析模块,用于根据所述第一条目,进行预定路径相关的路径分析。
  8. 根据权利要求7所述的装置,其特征在于,所述建立模块包括:
    获取单元,用于获取数据仓库中保存的原始访问表,其中,所述原始访问表中的每个条目中保存有一个用户访问的一个路径的信息;
    建立单元,用于根据所述原始访问表,建立所述访问表。
  9. 根据权利要求7或8所述的装置,其特征在于,所述分析模块包括:
    第一确定单元,用于确定所述第一条目中用户在访问所述预定路径之前所访问的路径的信息;
    第二确定单元,用于根据用户在访问所述预定路径之前所访问的路径的信息确定用户在访问所述预定路径之前所访问的路径的分布情况。
  10. 根据权利要求7或8所述的装置,其特征在于,所述分析模块包括:
    第三确定单元,用于确定所述条目中用户在访问所述预定路径之后的所访问的路径的信息;
    第四确定单元,用于根据用户在访问所述预定路径之后的所访问的路径的信息确定用户在访问所述预定路径之后的所访问的路径的分布情况。
  11. 一种路径分析系统,其特征在于包括:数据仓库和路径分析装置,其中,
    所述数据仓库,用于建立访问表,其中,所述访问表中的每个条目中保存有根据用户访问路径的顺序保存的一个用户访问的多个路径的信息;
    所述路径分析装置,用于查找所述访问表中的第一条目,其中,所述第一条目为包含预定路径的条目,并根据所述第一条目,进行预定路径相关的路径分析。
PCT/CN2014/089936 2013-11-19 2014-10-30 路径分析方法和装置 WO2015074477A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/037,783 US20160299903A1 (en) 2013-11-19 2014-10-30 Path analysis method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310585827.9A CN103605848A (zh) 2013-11-19 2013-11-19 路径分析方法和装置
CN201310585827.9 2013-11-19

Publications (1)

Publication Number Publication Date
WO2015074477A1 true WO2015074477A1 (zh) 2015-05-28

Family

ID=50124069

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/089936 WO2015074477A1 (zh) 2013-11-19 2014-10-30 路径分析方法和装置

Country Status (3)

Country Link
US (1) US20160299903A1 (zh)
CN (1) CN103605848A (zh)
WO (1) WO2015074477A1 (zh)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605848A (zh) * 2013-11-19 2014-02-26 北京国双科技有限公司 路径分析方法和装置
CN103823883B (zh) * 2014-03-06 2015-06-10 焦点科技股份有限公司 一种网站用户访问路径的分析方法及系统
CN106034110B (zh) * 2015-03-12 2019-02-05 腾讯科技(深圳)有限公司 访问链路生成方法及装置
CN106708841B (zh) * 2015-11-12 2018-09-18 北京国双科技有限公司 网站访问路径的聚合方法和装置
CN106909571B (zh) * 2015-12-23 2021-03-30 北京国双科技有限公司 网站的访问路径的分析方法及装置
CN108268475B (zh) * 2016-12-30 2021-12-28 北京国双科技有限公司 路径分析图展示方法及装置
CN106991038A (zh) * 2017-04-07 2017-07-28 广东亿迅科技有限公司 基于java采集器的服务监控方法及装置
CN109189810B (zh) * 2018-08-28 2021-07-02 拉扎斯网络科技(上海)有限公司 查询方法、装置、电子设备及计算机可读存储介质
CN109840190A (zh) * 2018-12-28 2019-06-04 深圳竹云科技有限公司 一种基于sessionid进行用户操作流程完整性风险分析的方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101344881A (zh) * 2007-07-09 2009-01-14 中国科学院大气物理研究所 海量文件型数据的索引生成方法及装置和搜索系统
US20130166498A1 (en) * 2011-12-25 2013-06-27 Microsoft Corporation Model Based OLAP Cube Framework
CN103605848A (zh) * 2013-11-19 2014-02-26 北京国双科技有限公司 路径分析方法和装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7194454B2 (en) * 2001-03-12 2007-03-20 Lucent Technologies Method for organizing records of database search activity by topical relevance
US7107285B2 (en) * 2002-03-16 2006-09-12 Questerra Corporation Method, system, and program for an improved enterprise spatial system
US7792844B2 (en) * 2002-06-28 2010-09-07 Adobe Systems Incorporated Capturing and presenting site visitation path data
JP3982623B2 (ja) * 2003-03-25 2007-09-26 インターナショナル・ビジネス・マシーンズ・コーポレーション 情報処理装置、データベース検索システム及びプログラム
US20050119999A1 (en) * 2003-09-06 2005-06-02 Oracle International Corporation Automatic learning optimizer
US20070271230A1 (en) * 2006-05-19 2007-11-22 Hart Matt E Method and apparatus for accessing history trails for previous search sessions

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101344881A (zh) * 2007-07-09 2009-01-14 中国科学院大气物理研究所 海量文件型数据的索引生成方法及装置和搜索系统
US20130166498A1 (en) * 2011-12-25 2013-06-27 Microsoft Corporation Model Based OLAP Cube Framework
CN103605848A (zh) * 2013-11-19 2014-02-26 北京国双科技有限公司 路径分析方法和装置

Also Published As

Publication number Publication date
US20160299903A1 (en) 2016-10-13
CN103605848A (zh) 2014-02-26

Similar Documents

Publication Publication Date Title
WO2015074477A1 (zh) 路径分析方法和装置
US10218808B2 (en) Scripting distributed, parallel programs
US11157473B2 (en) Multisource semantic partitioning
JP5298117B2 (ja) 分散コンピューティングにおけるデータマージング
US8560519B2 (en) Indexing and searching employing virtual documents
CN103620601A (zh) 在映射缩减过程中汇合表
WO2019024496A1 (zh) 企业推荐方法及应用服务器
US9430525B2 (en) Access plan for a database query
US10599654B2 (en) Method and system for determining unique events from a stream of events
US9229960B2 (en) Database management delete efficiency
US10496645B1 (en) System and method for analysis of a database proxy
WO2013106595A2 (en) Processing store visiting data
US9514184B2 (en) Systems and methods for a high speed query infrastructure
CN106484699B (zh) 数据库查询字段的生成方法及装置
US20170083566A1 (en) Partitioning advisor for online transaction processing workloads
CA3149710A1 (en) Data collecting method, device, computer equipment and storage medium
WO2017092444A1 (zh) 基于Hadoop的日志数据挖掘方法及系统
Amghar et al. Storing, preprocessing and analyzing tweets: finding the suitable noSQL system
US20210126964A1 (en) Connection pool anomaly detection mechanism
WO2020024824A1 (zh) 一种用户状态标识确定方法及装置
CN110442616B (zh) 一种针对大数据量的页面访问路径分析方法与系统
US20200242240A1 (en) Machine learning anomaly detection mechanism
CN115658680A (zh) 数据存储方法、数据查询方法和相关装置
CN106933909B (zh) 多维度数据的查询方法及装置
CN104778253A (zh) 一种提供数据的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14864472

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 15037783

Country of ref document: US

122 Ep: pct application non-entry in european phase

Ref document number: 14864472

Country of ref document: EP

Kind code of ref document: A1