CN101799830A - Flow data processing method capable of realizing multi-dimensional free analysis - Google Patents
Flow data processing method capable of realizing multi-dimensional free analysis Download PDFInfo
- Publication number
- CN101799830A CN101799830A CN 201010131551 CN201010131551A CN101799830A CN 101799830 A CN101799830 A CN 101799830A CN 201010131551 CN201010131551 CN 201010131551 CN 201010131551 A CN201010131551 A CN 201010131551A CN 101799830 A CN101799830 A CN 101799830A
- Authority
- CN
- China
- Prior art keywords
- data
- dimension
- flow data
- olap
- session
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention provides a flow data processing method capable of realizing multi-dimensional free analysis. OLAP (online analytical processing) technology is adopted to correspondingly classify original flow data and establish relevant association. Simultaneously, relevant association is also established between each classification of flow data and all dimensions, and all classified data share dimension data, so the multi-dimensional free analysis on the flow data can be realized by using multi-dimension to analyze the data layer by layer. A user can take any part of the flow data out in a report form, perform gradual free analysis by using a plurality of the dimensions and select the flow data continuously until a final desired data report is obtained.
Description
Technical field
The present invention relates to the web traffic analysis, relate in particular to a kind of flow data processing method of realizing multi-dimensional free analysis.
Background technology
Web traffic analytic system commonly used at present, the more built-in form commonly used in capital, province form for example, search engine form, keyword form etc., to be each form get final product with regard to only carrying out the polymerization computing to database table wherein for the dimension of these composition data flows, the data processing method that it adopted, and do not have related between form and the form, so these forms all are " static state ", that is to say that the client takes after these forms, he can't carry out other operations.Even he produces query to the partial data in the form, what also has no idea to do, because all analyze dimensions, all at whole datas on flows, rather than at wherein a part of data on flows of certain form.Seemed to provide the dimension of many analyses, but all isolate between the dimension, all dimensions all are based on whole datas on flows (certainly, here said whole data on flows, say accurately, should be the whole datas on flows in a certain period) analyze, at this moment, the client can only see similar three isolated forms shown in Figure 1.The client takes after this form, a superficial understanding also can only be arranged the flow of oneself, if the client has more senior demand, think the alternate analysis form, want the partial data in form is wherein analyzed separately, for example, the client want to check from " Beijing " and " Guangdong " and be in the flow of coming by the Google search, which keyword these visitors have searched for, and which commodity each keyword all bought at last, and this class instrument just can't solve so.
Summary of the invention
At above defective, the purpose of this invention is to provide a kind of flow data processing method, can realize the multi-dimensional free analysis of data on flows.
For achieving the above object, the present invention is achieved through the following technical solutions:
A kind of flow data processing method of realizing multi-dimensional free analysis adopts OLAP (onlineanalytical processing on-line analytical processing) technology, may further comprise the steps:
(1), original data on flows is carried out standard and also be divided into 6 classifications accordingly, be respectively page access data, session access data, guest access data, ad data, electronic commerce data and click data, for these several classifications are set up OLAP fact table (FactTable) respectively, set up corresponding related between the different flow data by Session ID and visitor's identification identifier;
(2), set up suitable dimension, similar dimension is grouped into a class, sets up corresponding dimension table, data generate unique major key in the table, carry out related with the data on flows session of above-mentioned 6 big classifications respectively then at each data category;
(3), set up the OLAP data cube according to above-mentioned fact table and dimension table, by MDX (Multidimensional Expressions Multidimensional Expressions) the final form of language generation.
The present invention simultaneously, between the data on flows and all dimensions for each classification, has also set up corresponding association by original data on flows being sorted out accordingly and being set up corresponding association.All shared these dimension datas of data after all are sorted out, therefore can use various dimensions to go profile data successively, realized multi-dimensional free analysis to data on flows, the user can take out any a part of data on flows in the form, use a plurality of dimensions progressively to analyze freely, constantly screening is until the data sheet of finally being wanted.
Description of drawings
The present invention is described in further detail according to embodiment and accompanying drawing below.
Fig. 1 is the isolated mutually report form synoptic diagram that existing web traffic analytic system is generated;
Fig. 2 is the correlationship figure that adopts between the method for the invention classification data afterwards;
Fig. 3 is the data on flows of session data in the example and the graph of a relation between the dimension;
Fig. 4-Fig. 6 sets up dimension table and is generated the schematic diagram of fact table by dimension table at raw data;
Fig. 7 is the OLAP data cube of setting up according to fact table and dimension table.
Embodiment
In order to realize the analysis data on flows of multi-dimensional free, our WebDissector background system has adopted OLAP (online analytical processing on-line analytical processing) technology, original data on flows is carried out standard and is divided into 6 classifications accordingly, be respectively page access data, session access data, guest access data, ad data, electronic commerce data and click data, data after sorting out are not what isolate, but certain association is arranged, graph of a relation is as shown in Figure 2.Association between them, be to realize by the identifier of two cores, one of them is a Session ID, is used for the data with a session (we will sum up in the point that in the session with a close page access of visitor), comprise ad data, click data etc. are gone here and there together; Another one is visitor's identification identifier, is used for unique sign visitor, we with this identifier with visitor visit data string for a long time to together.Data on flows after all are sorted out has all comprised this two important indications, therefore could set up the relation of the various complexity among the last figure.
Setting up suitable dimension at each data category, between the data on flows and all dimensions for each classification, also is that corresponding association is arranged; The act session data is an example, as shown in Figure 3.
As shown in Figure 4, provided the raw data of a session, wherein province and city are of a sort, it is classified as " geographical environment table " as a dimension, set up corresponding dimension table, search engine and keyword are classes, it are included in " session source table " as a dimension, set up corresponding dimension table, and be the major key of data allocations in each dimension table; The major key of respective dimension kilsyth basalt is set up corresponding OLAP fact table (FactTable) by reference.Promptly organize relation between the data, so just produced structure as shown in Figure 5 by setting up fact table and dimension table.
In this way, the memory complexity of background system just greatly reduces, the consistance of data is guaranteed, simultaneously, major key in the respective dimension kilsyth basalt obtains data on flows and generates corresponding form by reference, can reuse identical dimension table between the different fact tables, for example the data of ecommerce also can be quoted same geographical dimension kilsyth basalt and source dimension table, as shown in Figure 6.
Because a fact table direct correlation several dimension tables, we just can set up data warehouse memory structure as shown in Figure 7, and it is one typical 3 data cube of tieing up, each cell all correspondence a batch session.The client can precisely orient the flow that he pays close attention to according to the wish of oneself, for example among Fig. 7 shown in the dark cell, be " from Beijing " and the flow of coming the website by " Sogou search engine " search " popularization " keyword, always have 37 visits, by setting up this OLAP data cube, further can utilize MDX (Multidimensional Expressions Multidimensional Expressions) the final form of language generation.
What above-mentioned example was showed is the parsing process of 3 dimensions, we in addition can take out these " flow cell lattice ", as a new data cube, use new dimension to analyze, finally can reach and adopt various dimensions to go profile data successively, constantly screening is until the data sheet of finally being wanted.
Claims (1)
1. the flow data processing method that can realize multi-dimensional free analysis adopts OLAP (onlineanalytical processing on-line analytical processing) technology, it is characterized in that may further comprise the steps:
(1), original data on flows is carried out standard and also be divided into 6 classifications accordingly, be respectively page access data, session access data, guest access data, ad data, electronic commerce data and click data, for these several classifications are set up OLAP fact table (FactTable) respectively, set up corresponding related between the different flow data by Session ID and visitor's identification identifier;
(2), set up suitable dimension, similar dimension is grouped into a class, sets up corresponding dimension table, data generate unique major key in the table, carry out related with the data on flows session of above-mentioned 6 big classifications respectively then at each data category;
(3), set up the OLAP data cube according to above-mentioned fact table and dimension table, by MDX (Multidimensional Expressions Multidimensional Expressions) the final form of language generation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201010131551 CN101799830A (en) | 2010-03-25 | 2010-03-25 | Flow data processing method capable of realizing multi-dimensional free analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201010131551 CN101799830A (en) | 2010-03-25 | 2010-03-25 | Flow data processing method capable of realizing multi-dimensional free analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101799830A true CN101799830A (en) | 2010-08-11 |
Family
ID=42595506
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201010131551 Pending CN101799830A (en) | 2010-03-25 | 2010-03-25 | Flow data processing method capable of realizing multi-dimensional free analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101799830A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102999506A (en) * | 2011-09-13 | 2013-03-27 | 阿里巴巴集团控股有限公司 | Method and device for obtaining unique visitor (UV) |
CN103020146A (en) * | 2012-11-22 | 2013-04-03 | 华为技术有限公司 | Data processing method and equipment |
CN103294687A (en) * | 2012-02-24 | 2013-09-11 | 腾讯科技(深圳)有限公司 | Method and system for counting visitors of personal page |
CN104426713A (en) * | 2013-08-28 | 2015-03-18 | 腾讯科技(北京)有限公司 | Method and device for monitoring network site access effect data |
CN104657370A (en) * | 2013-11-19 | 2015-05-27 | 中国移动通信集团天津有限公司 | Method and device for achieving multi-dimensional cube association |
CN105873119A (en) * | 2016-05-26 | 2016-08-17 | 重庆大学 | Method for classifying flow use behaviors of mobile network user groups |
CN106095859A (en) * | 2016-06-02 | 2016-11-09 | 成都淞幸科技有限责任公司 | Various dimensions Chinese medicine acupuncture association rule mining method based on OLAM |
CN106484715A (en) * | 2015-08-27 | 2017-03-08 | 北京国双科技有限公司 | Data for path conversion dissects method and apparatus |
CN106933905A (en) * | 2015-12-31 | 2017-07-07 | 北京国双科技有限公司 | The monitoring method and device of web page access data |
CN107729500A (en) * | 2017-10-20 | 2018-02-23 | 锐捷网络股份有限公司 | A kind of data processing method of on-line analytical processing, device and background devices |
CN110020364A (en) * | 2017-11-27 | 2019-07-16 | 北京京东尚科信息技术有限公司 | The method and apparatus for determining the traffic source of page access |
CN110677310A (en) * | 2018-07-03 | 2020-01-10 | 百度在线网络技术(北京)有限公司 | Traffic attribution method, device and terminal |
CN112069021A (en) * | 2020-08-21 | 2020-12-11 | 北京五八信息技术有限公司 | Flow data storage method and device, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1564160A (en) * | 2004-04-22 | 2005-01-12 | 重庆市弘越科技有限公司 | Method of seting up and inquirying multiple-demensional data cube |
CN101197876A (en) * | 2006-12-06 | 2008-06-11 | 中兴通讯股份有限公司 | Method and system for multi-dimensional analysis of message service data |
CN101599088A (en) * | 2008-11-18 | 2009-12-09 | 北京美智医疗科技有限公司 | The mining multi-dimensional data system and method for medical information system |
-
2010
- 2010-03-25 CN CN 201010131551 patent/CN101799830A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1564160A (en) * | 2004-04-22 | 2005-01-12 | 重庆市弘越科技有限公司 | Method of seting up and inquirying multiple-demensional data cube |
CN101197876A (en) * | 2006-12-06 | 2008-06-11 | 中兴通讯股份有限公司 | Method and system for multi-dimensional analysis of message service data |
CN101599088A (en) * | 2008-11-18 | 2009-12-09 | 北京美智医疗科技有限公司 | The mining multi-dimensional data system and method for medical information system |
Non-Patent Citations (1)
Title |
---|
《现代计算机》 20080430 黄志成 基于数据仓库的网络流量OLAP设计与实现 第85-86页 1 , 第281期 2 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102999506A (en) * | 2011-09-13 | 2013-03-27 | 阿里巴巴集团控股有限公司 | Method and device for obtaining unique visitor (UV) |
CN103294687A (en) * | 2012-02-24 | 2013-09-11 | 腾讯科技(深圳)有限公司 | Method and system for counting visitors of personal page |
CN103294687B (en) * | 2012-02-24 | 2016-06-08 | 腾讯科技(深圳)有限公司 | The method of statistics private page visitor and system |
CN103020146A (en) * | 2012-11-22 | 2013-04-03 | 华为技术有限公司 | Data processing method and equipment |
CN104426713A (en) * | 2013-08-28 | 2015-03-18 | 腾讯科技(北京)有限公司 | Method and device for monitoring network site access effect data |
CN104657370A (en) * | 2013-11-19 | 2015-05-27 | 中国移动通信集团天津有限公司 | Method and device for achieving multi-dimensional cube association |
CN104657370B (en) * | 2013-11-19 | 2018-09-04 | 中国移动通信集团天津有限公司 | A kind of associated method and apparatus of realization multi-dimension data cube |
CN106484715A (en) * | 2015-08-27 | 2017-03-08 | 北京国双科技有限公司 | Data for path conversion dissects method and apparatus |
CN106933905A (en) * | 2015-12-31 | 2017-07-07 | 北京国双科技有限公司 | The monitoring method and device of web page access data |
CN105873119A (en) * | 2016-05-26 | 2016-08-17 | 重庆大学 | Method for classifying flow use behaviors of mobile network user groups |
CN106095859A (en) * | 2016-06-02 | 2016-11-09 | 成都淞幸科技有限责任公司 | Various dimensions Chinese medicine acupuncture association rule mining method based on OLAM |
CN107729500A (en) * | 2017-10-20 | 2018-02-23 | 锐捷网络股份有限公司 | A kind of data processing method of on-line analytical processing, device and background devices |
CN107729500B (en) * | 2017-10-20 | 2021-01-05 | 锐捷网络股份有限公司 | Data processing method and device for online analysis processing and background equipment |
CN110020364A (en) * | 2017-11-27 | 2019-07-16 | 北京京东尚科信息技术有限公司 | The method and apparatus for determining the traffic source of page access |
CN110020364B (en) * | 2017-11-27 | 2021-11-30 | 北京京东尚科信息技术有限公司 | Method and device for determining flow source of page access |
CN110677310A (en) * | 2018-07-03 | 2020-01-10 | 百度在线网络技术(北京)有限公司 | Traffic attribution method, device and terminal |
CN112069021A (en) * | 2020-08-21 | 2020-12-11 | 北京五八信息技术有限公司 | Flow data storage method and device, electronic equipment and storage medium |
CN112069021B (en) * | 2020-08-21 | 2024-02-20 | 北京五八信息技术有限公司 | Flow data storage method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101799830A (en) | Flow data processing method capable of realizing multi-dimensional free analysis | |
Valencia-Hernández et al. | SAP algorithm for citation analysis: An improvement to tree of science | |
Garimella et al. | Quantifying controversy on social media | |
Wang et al. | Patent co-citation networks of Fortune 500 companies | |
CN105718579B (en) | A kind of information-pushing method excavated based on internet log and User Activity identifies | |
CN105912656B (en) | Method for constructing commodity knowledge graph | |
CN105404699A (en) | Method, device and server for searching articles of finance and economics | |
CN105787068B (en) | The academic recommended method and system analyzed based on citation network and user's proficiency | |
CN105912716A (en) | Short text classification method and apparatus | |
CN106484813A (en) | A kind of big data analysis system and method | |
CN107145523A (en) | Large-scale Heterogeneous Knowledge storehouse alignment schemes based on Iterative matching | |
CN104217038A (en) | Knowledge network building method for financial news | |
Nikhil et al. | A survey on text mining and sentiment analysis for unstructured web data | |
Cheng et al. | Mining research trends with anomaly detection models: the case of social computing research | |
Yang et al. | Cross-platform comparison of framed topics in Twitter and Weibo: machine learning approaches to social media text mining | |
Ferreira et al. | Automatic disambiguation of author names in bibliographic repositories | |
Chen et al. | Exploring technology opportunities and evolution of IoT-related logistics services with text mining | |
CN110263021B (en) | Theme library generation method based on personalized label system | |
CN103984700B (en) | A kind of isomeric data analysis method for scientific and technological information vertical search | |
Ando et al. | Globalization and domestic operations: Applying the JC/JD method to Japanese manufacturing firms | |
US20190087499A1 (en) | Identifying domain-specific accounts | |
Cox et al. | Demystifying digital x | |
CN104239314A (en) | Search word expanding method and system | |
Amirbagheri et al. | A bibliometric analysis of leading countries in supply chain management research | |
Matsunaga et al. | Data mining applications and techniques: A systematic review |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20100811 |