CN101799830A - Flow data processing method capable of realizing multi-dimensional free analysis - Google Patents

Flow data processing method capable of realizing multi-dimensional free analysis Download PDF

Info

Publication number
CN101799830A
CN101799830A CN 201010131551 CN201010131551A CN101799830A CN 101799830 A CN101799830 A CN 101799830A CN 201010131551 CN201010131551 CN 201010131551 CN 201010131551 A CN201010131551 A CN 201010131551A CN 101799830 A CN101799830 A CN 101799830A
Authority
CN
China
Prior art keywords
data
dimension
flow data
olap
session
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201010131551
Other languages
Chinese (zh)
Inventor
黄勇坚
吴充
杨基彬
钟志龙
祁国晟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN 201010131551 priority Critical patent/CN101799830A/en
Publication of CN101799830A publication Critical patent/CN101799830A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides a flow data processing method capable of realizing multi-dimensional free analysis. OLAP (online analytical processing) technology is adopted to correspondingly classify original flow data and establish relevant association. Simultaneously, relevant association is also established between each classification of flow data and all dimensions, and all classified data share dimension data, so the multi-dimensional free analysis on the flow data can be realized by using multi-dimension to analyze the data layer by layer. A user can take any part of the flow data out in a report form, perform gradual free analysis by using a plurality of the dimensions and select the flow data continuously until a final desired data report is obtained.

Description

Can realize the flow data processing method of multi-dimensional free analysis
Technical field
The present invention relates to the web traffic analysis, relate in particular to a kind of flow data processing method of realizing multi-dimensional free analysis.
Background technology
Web traffic analytic system commonly used at present, the more built-in form commonly used in capital, province form for example, search engine form, keyword form etc., to be each form get final product with regard to only carrying out the polymerization computing to database table wherein for the dimension of these composition data flows, the data processing method that it adopted, and do not have related between form and the form, so these forms all are " static state ", that is to say that the client takes after these forms, he can't carry out other operations.Even he produces query to the partial data in the form, what also has no idea to do, because all analyze dimensions, all at whole datas on flows, rather than at wherein a part of data on flows of certain form.Seemed to provide the dimension of many analyses, but all isolate between the dimension, all dimensions all are based on whole datas on flows (certainly, here said whole data on flows, say accurately, should be the whole datas on flows in a certain period) analyze, at this moment, the client can only see similar three isolated forms shown in Figure 1.The client takes after this form, a superficial understanding also can only be arranged the flow of oneself, if the client has more senior demand, think the alternate analysis form, want the partial data in form is wherein analyzed separately, for example, the client want to check from " Beijing " and " Guangdong " and be in the flow of coming by the Google search, which keyword these visitors have searched for, and which commodity each keyword all bought at last, and this class instrument just can't solve so.
Summary of the invention
At above defective, the purpose of this invention is to provide a kind of flow data processing method, can realize the multi-dimensional free analysis of data on flows.
For achieving the above object, the present invention is achieved through the following technical solutions:
A kind of flow data processing method of realizing multi-dimensional free analysis adopts OLAP (onlineanalytical processing on-line analytical processing) technology, may further comprise the steps:
(1), original data on flows is carried out standard and also be divided into 6 classifications accordingly, be respectively page access data, session access data, guest access data, ad data, electronic commerce data and click data, for these several classifications are set up OLAP fact table (FactTable) respectively, set up corresponding related between the different flow data by Session ID and visitor's identification identifier;
(2), set up suitable dimension, similar dimension is grouped into a class, sets up corresponding dimension table, data generate unique major key in the table, carry out related with the data on flows session of above-mentioned 6 big classifications respectively then at each data category;
(3), set up the OLAP data cube according to above-mentioned fact table and dimension table, by MDX (Multidimensional Expressions Multidimensional Expressions) the final form of language generation.
The present invention simultaneously, between the data on flows and all dimensions for each classification, has also set up corresponding association by original data on flows being sorted out accordingly and being set up corresponding association.All shared these dimension datas of data after all are sorted out, therefore can use various dimensions to go profile data successively, realized multi-dimensional free analysis to data on flows, the user can take out any a part of data on flows in the form, use a plurality of dimensions progressively to analyze freely, constantly screening is until the data sheet of finally being wanted.
Description of drawings
The present invention is described in further detail according to embodiment and accompanying drawing below.
Fig. 1 is the isolated mutually report form synoptic diagram that existing web traffic analytic system is generated;
Fig. 2 is the correlationship figure that adopts between the method for the invention classification data afterwards;
Fig. 3 is the data on flows of session data in the example and the graph of a relation between the dimension;
Fig. 4-Fig. 6 sets up dimension table and is generated the schematic diagram of fact table by dimension table at raw data;
Fig. 7 is the OLAP data cube of setting up according to fact table and dimension table.
Embodiment
In order to realize the analysis data on flows of multi-dimensional free, our WebDissector background system has adopted OLAP (online analytical processing on-line analytical processing) technology, original data on flows is carried out standard and is divided into 6 classifications accordingly, be respectively page access data, session access data, guest access data, ad data, electronic commerce data and click data, data after sorting out are not what isolate, but certain association is arranged, graph of a relation is as shown in Figure 2.Association between them, be to realize by the identifier of two cores, one of them is a Session ID, is used for the data with a session (we will sum up in the point that in the session with a close page access of visitor), comprise ad data, click data etc. are gone here and there together; Another one is visitor's identification identifier, is used for unique sign visitor, we with this identifier with visitor visit data string for a long time to together.Data on flows after all are sorted out has all comprised this two important indications, therefore could set up the relation of the various complexity among the last figure.
Setting up suitable dimension at each data category, between the data on flows and all dimensions for each classification, also is that corresponding association is arranged; The act session data is an example, as shown in Figure 3.
As shown in Figure 4, provided the raw data of a session, wherein province and city are of a sort, it is classified as " geographical environment table " as a dimension, set up corresponding dimension table, search engine and keyword are classes, it are included in " session source table " as a dimension, set up corresponding dimension table, and be the major key of data allocations in each dimension table; The major key of respective dimension kilsyth basalt is set up corresponding OLAP fact table (FactTable) by reference.Promptly organize relation between the data, so just produced structure as shown in Figure 5 by setting up fact table and dimension table.
In this way, the memory complexity of background system just greatly reduces, the consistance of data is guaranteed, simultaneously, major key in the respective dimension kilsyth basalt obtains data on flows and generates corresponding form by reference, can reuse identical dimension table between the different fact tables, for example the data of ecommerce also can be quoted same geographical dimension kilsyth basalt and source dimension table, as shown in Figure 6.
Because a fact table direct correlation several dimension tables, we just can set up data warehouse memory structure as shown in Figure 7, and it is one typical 3 data cube of tieing up, each cell all correspondence a batch session.The client can precisely orient the flow that he pays close attention to according to the wish of oneself, for example among Fig. 7 shown in the dark cell, be " from Beijing " and the flow of coming the website by " Sogou search engine " search " popularization " keyword, always have 37 visits, by setting up this OLAP data cube, further can utilize MDX (Multidimensional Expressions Multidimensional Expressions) the final form of language generation.
What above-mentioned example was showed is the parsing process of 3 dimensions, we in addition can take out these " flow cell lattice ", as a new data cube, use new dimension to analyze, finally can reach and adopt various dimensions to go profile data successively, constantly screening is until the data sheet of finally being wanted.

Claims (1)

1. the flow data processing method that can realize multi-dimensional free analysis adopts OLAP (onlineanalytical processing on-line analytical processing) technology, it is characterized in that may further comprise the steps:
(1), original data on flows is carried out standard and also be divided into 6 classifications accordingly, be respectively page access data, session access data, guest access data, ad data, electronic commerce data and click data, for these several classifications are set up OLAP fact table (FactTable) respectively, set up corresponding related between the different flow data by Session ID and visitor's identification identifier;
(2), set up suitable dimension, similar dimension is grouped into a class, sets up corresponding dimension table, data generate unique major key in the table, carry out related with the data on flows session of above-mentioned 6 big classifications respectively then at each data category;
(3), set up the OLAP data cube according to above-mentioned fact table and dimension table, by MDX (Multidimensional Expressions Multidimensional Expressions) the final form of language generation.
CN 201010131551 2010-03-25 2010-03-25 Flow data processing method capable of realizing multi-dimensional free analysis Pending CN101799830A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010131551 CN101799830A (en) 2010-03-25 2010-03-25 Flow data processing method capable of realizing multi-dimensional free analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010131551 CN101799830A (en) 2010-03-25 2010-03-25 Flow data processing method capable of realizing multi-dimensional free analysis

Publications (1)

Publication Number Publication Date
CN101799830A true CN101799830A (en) 2010-08-11

Family

ID=42595506

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010131551 Pending CN101799830A (en) 2010-03-25 2010-03-25 Flow data processing method capable of realizing multi-dimensional free analysis

Country Status (1)

Country Link
CN (1) CN101799830A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102999506A (en) * 2011-09-13 2013-03-27 阿里巴巴集团控股有限公司 Method and device for obtaining unique visitor (UV)
CN103020146A (en) * 2012-11-22 2013-04-03 华为技术有限公司 Data processing method and equipment
CN103294687A (en) * 2012-02-24 2013-09-11 腾讯科技(深圳)有限公司 Method and system for counting visitors of personal page
CN104426713A (en) * 2013-08-28 2015-03-18 腾讯科技(北京)有限公司 Method and device for monitoring network site access effect data
CN104657370A (en) * 2013-11-19 2015-05-27 中国移动通信集团天津有限公司 Method and device for achieving multi-dimensional cube association
CN105873119A (en) * 2016-05-26 2016-08-17 重庆大学 Method for classifying flow use behaviors of mobile network user groups
CN106095859A (en) * 2016-06-02 2016-11-09 成都淞幸科技有限责任公司 Various dimensions Chinese medicine acupuncture association rule mining method based on OLAM
CN106484715A (en) * 2015-08-27 2017-03-08 北京国双科技有限公司 Data for path conversion dissects method and apparatus
CN106933905A (en) * 2015-12-31 2017-07-07 北京国双科技有限公司 The monitoring method and device of web page access data
CN107729500A (en) * 2017-10-20 2018-02-23 锐捷网络股份有限公司 A kind of data processing method of on-line analytical processing, device and background devices
CN110020364A (en) * 2017-11-27 2019-07-16 北京京东尚科信息技术有限公司 The method and apparatus for determining the traffic source of page access
CN110677310A (en) * 2018-07-03 2020-01-10 百度在线网络技术(北京)有限公司 Traffic attribution method, device and terminal
CN112069021A (en) * 2020-08-21 2020-12-11 北京五八信息技术有限公司 Flow data storage method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1564160A (en) * 2004-04-22 2005-01-12 重庆市弘越科技有限公司 Method of seting up and inquirying multiple-demensional data cube
CN101197876A (en) * 2006-12-06 2008-06-11 中兴通讯股份有限公司 Method and system for multi-dimensional analysis of message service data
CN101599088A (en) * 2008-11-18 2009-12-09 北京美智医疗科技有限公司 The mining multi-dimensional data system and method for medical information system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1564160A (en) * 2004-04-22 2005-01-12 重庆市弘越科技有限公司 Method of seting up and inquirying multiple-demensional data cube
CN101197876A (en) * 2006-12-06 2008-06-11 中兴通讯股份有限公司 Method and system for multi-dimensional analysis of message service data
CN101599088A (en) * 2008-11-18 2009-12-09 北京美智医疗科技有限公司 The mining multi-dimensional data system and method for medical information system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《现代计算机》 20080430 黄志成 基于数据仓库的网络流量OLAP设计与实现 第85-86页 1 , 第281期 2 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102999506A (en) * 2011-09-13 2013-03-27 阿里巴巴集团控股有限公司 Method and device for obtaining unique visitor (UV)
CN103294687A (en) * 2012-02-24 2013-09-11 腾讯科技(深圳)有限公司 Method and system for counting visitors of personal page
CN103294687B (en) * 2012-02-24 2016-06-08 腾讯科技(深圳)有限公司 The method of statistics private page visitor and system
CN103020146A (en) * 2012-11-22 2013-04-03 华为技术有限公司 Data processing method and equipment
CN104426713A (en) * 2013-08-28 2015-03-18 腾讯科技(北京)有限公司 Method and device for monitoring network site access effect data
CN104657370A (en) * 2013-11-19 2015-05-27 中国移动通信集团天津有限公司 Method and device for achieving multi-dimensional cube association
CN104657370B (en) * 2013-11-19 2018-09-04 中国移动通信集团天津有限公司 A kind of associated method and apparatus of realization multi-dimension data cube
CN106484715A (en) * 2015-08-27 2017-03-08 北京国双科技有限公司 Data for path conversion dissects method and apparatus
CN106933905A (en) * 2015-12-31 2017-07-07 北京国双科技有限公司 The monitoring method and device of web page access data
CN105873119A (en) * 2016-05-26 2016-08-17 重庆大学 Method for classifying flow use behaviors of mobile network user groups
CN106095859A (en) * 2016-06-02 2016-11-09 成都淞幸科技有限责任公司 Various dimensions Chinese medicine acupuncture association rule mining method based on OLAM
CN107729500A (en) * 2017-10-20 2018-02-23 锐捷网络股份有限公司 A kind of data processing method of on-line analytical processing, device and background devices
CN107729500B (en) * 2017-10-20 2021-01-05 锐捷网络股份有限公司 Data processing method and device for online analysis processing and background equipment
CN110020364A (en) * 2017-11-27 2019-07-16 北京京东尚科信息技术有限公司 The method and apparatus for determining the traffic source of page access
CN110020364B (en) * 2017-11-27 2021-11-30 北京京东尚科信息技术有限公司 Method and device for determining flow source of page access
CN110677310A (en) * 2018-07-03 2020-01-10 百度在线网络技术(北京)有限公司 Traffic attribution method, device and terminal
CN112069021A (en) * 2020-08-21 2020-12-11 北京五八信息技术有限公司 Flow data storage method and device, electronic equipment and storage medium
CN112069021B (en) * 2020-08-21 2024-02-20 北京五八信息技术有限公司 Flow data storage method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN101799830A (en) Flow data processing method capable of realizing multi-dimensional free analysis
Valencia-Hernández et al. SAP algorithm for citation analysis: An improvement to tree of science
Garimella et al. Quantifying controversy on social media
Wang et al. Patent co-citation networks of Fortune 500 companies
CN105718579B (en) A kind of information-pushing method excavated based on internet log and User Activity identifies
CN105912656B (en) Method for constructing commodity knowledge graph
CN105404699A (en) Method, device and server for searching articles of finance and economics
CN105787068B (en) The academic recommended method and system analyzed based on citation network and user's proficiency
CN105912716A (en) Short text classification method and apparatus
CN106484813A (en) A kind of big data analysis system and method
CN107145523A (en) Large-scale Heterogeneous Knowledge storehouse alignment schemes based on Iterative matching
CN104217038A (en) Knowledge network building method for financial news
Nikhil et al. A survey on text mining and sentiment analysis for unstructured web data
Cheng et al. Mining research trends with anomaly detection models: the case of social computing research
Yang et al. Cross-platform comparison of framed topics in Twitter and Weibo: machine learning approaches to social media text mining
Ferreira et al. Automatic disambiguation of author names in bibliographic repositories
Chen et al. Exploring technology opportunities and evolution of IoT-related logistics services with text mining
CN110263021B (en) Theme library generation method based on personalized label system
CN103984700B (en) A kind of isomeric data analysis method for scientific and technological information vertical search
Ando et al. Globalization and domestic operations: Applying the JC/JD method to Japanese manufacturing firms
US20190087499A1 (en) Identifying domain-specific accounts
Cox et al. Demystifying digital x
CN104239314A (en) Search word expanding method and system
Amirbagheri et al. A bibliometric analysis of leading countries in supply chain management research
Matsunaga et al. Data mining applications and techniques: A systematic review

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20100811