CN104182506A - Log management method - Google Patents

Log management method Download PDF

Info

Publication number
CN104182506A
CN104182506A CN201410409927.0A CN201410409927A CN104182506A CN 104182506 A CN104182506 A CN 104182506A CN 201410409927 A CN201410409927 A CN 201410409927A CN 104182506 A CN104182506 A CN 104182506A
Authority
CN
China
Prior art keywords
daily record
value
log
management method
described
Prior art date
Application number
CN201410409927.0A
Other languages
Chinese (zh)
Inventor
刘璧怡
郭美思
吴楠
Original Assignee
浪潮(北京)电子信息产业有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浪潮(北京)电子信息产业有限公司 filed Critical 浪潮(北京)电子信息产业有限公司
Priority to CN201410409927.0A priority Critical patent/CN104182506A/en
Publication of CN104182506A publication Critical patent/CN104182506A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/137Hash-based
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs

Abstract

The invention provides a log management method, and relates to the field of computer application. By the aid of the log management method, log data can be stored and managed in a distributed manner. The log management method includes collecting traffic log and click log; preprocessing the acquired traffic log and the acquired click log; storing the preprocessed traffic log and the preprocessed click log in a distributed manner. According to the technical scheme, the log management method has the advantages that the log management method is applicable to data mining, and log files can be stored in the distributed manner on the basis of HDFS (Hadoop distributed file system) architectures.

Description

Blog management method

Technical field

The present invention relates to computer application field, relate in particular to a kind of blog management method.

Background technology

Along with the development rapidly of internet, Internet user measures also sharp increase, and user's access log is also expanded rapidly.For internet, web daily record is very important information.Particularly, for some large-scale e-commerce website or social network sites, can pass through the excavation to web daily record, find out user's potential access pattern, thereby design the Web Organization pattern of the user's access of being more convenient for.Yet how in the daily record of these magnanimity, to excavate to enterprise Useful Information, and to make according to this correct decision-making be very important work.Because Web daily record is generated by numerous users, it has the features such as data source diversity, magnanimity, transmission conditions uncertainty, and complete daily record is the assurance to later analysis work.From collector journal to log analysis, be a very complicated job, it not only requires higher reliability, also needs ageing.So separate unit main frame is no matter be aspect log store or calculating, although hardware configuration is very high, its processing power is limited.Therefore adopt distributed storage and calculate oneself through becoming inevitable development trend.

Aspect Distributed Calculation, approximately there is hundreds of different scheme in the whole world.Hadoop is exactly use distributed storage and distributed computing framework comparatively widely, be applicable to large-scale distributed calculating, more and more paid attention to, at aspects such as advertisement calculating, log analysis, Webpage search and data minings, be all widely used.In the last few years, the memory space of hard disk was increasing fast, but the access speed of hard disk but can not grow with each passing hour.When data volume is very large, read operation will spend the longer time.

Summary of the invention

The invention provides a kind of blog management method, solved the problem of distributed storage management daily record data.

A blog management method, comprising:

Collect flow daily record and click logs;

To collecting the daily record obtaining, carry out pre-service;

The pretreated daily record of distributed storage.

Preferably, collecting flow daily record and click logs comprises:

When user opens web page, needed information is combined into character string and sends front-end server to, described information comprises the arbitrary of following content or multinomial arbitrarily:

Time, client ip, user profile, reference address, refer address.

Preferably, daily record collection being obtained is carried out pre-service and is comprised:

The page number that calculates the page residence time of each access, browses in the level of the page of access and one-time continuous access;

The daily record of collecting is connected with user information database according to user's UID information, obtains the user profile of calling party, described user information database is preserved all User Details;

The flow daily record of collecting is converted into flow daily record standard format, the click logs of collecting is converted into click logs standard format.

Preferably, the pretreated daily record of distributed storage comprises:

Described pretreated daily record is carried out to the configuration of block size and number of copies;

To HDFS file system, upload described pretreated daily record, with LZO form, be stored in described HDFS file system.

Preferably, the method also comprises:

The configuration information of reading out data table, imports respectively in different files by Log Types;

Daily record is carried out to Map operation, log processing is become to key-value form, obtain Map result;

Described Map result is carried out to union operation, obtain amalgamation result;

Described merging is carried out to Reduce operation, obtain data results, this data results is deposited in described HDFS file system;

From described HDFS file system, described data results is imported to database, for user's inquiry.

Preferably, the configuration information of reading out data table, imports respectively different files by Log Types and comprises:

The configuration information of reading out data table from database, described configuration information comprises configuration information, dimension and the dimension values of column information, index;

Described configuration information is imported respectively in different files by Log Types, and upload in HDFS.

Preferably, daily record is carried out to Map operation, log processing is become to key-value form, obtain Map result and comprise:

Each index of every kind of Log Types of searching loop;

According to the computation rule of each preset index, select applicable compute type to carry out daily record, will is processed into the form of key-value, using the daily record of this key-value form as Map result.

Preferably, described compute type comprises:

Counting type, turns to key=date+index ID, value=l by the journal format that meets current computation rule;

Cumulative type, is key=date+index ID by the log processing that meets current computation rule, the value of value=calculated column;

Classified counting type, turns to key=date+index ID+ by the log record that meets current computation rule and organizes ID, value=l.

The cumulative type of grouping is that key=date+index ID+ organizes ID, the value of value=calculated column by the log processing that meets current computation rule.

Preferably, described Map result is carried out to union operation, obtains amalgamation result and comprise:

Map result is incorporated into the only corresponding daily record of each key-value value.

The invention provides a kind of blog management method, collect flow daily record and click logs, to collecting the daily record obtaining, carry out pre-service, the pretreated daily record of distributed storage.Realize the journal file distributed storage based on HDFS framework, solved the problem of distributed storage management daily record data.

Accompanying drawing explanation

The structural representation of a kind of Log Administration System that Fig. 1 provides for embodiments of the invention one.

Embodiment

Aspect Distributed Calculation, approximately there is hundreds of different scheme in the whole world.Hadoop is exactly use distributed storage and distributed computing framework comparatively widely, be applicable to large-scale distributed calculating, more and more paid attention to, at aspects such as advertisement calculating, log analysis, Webpage search and data minings, be all widely used.In the last few years, the memory space of hard disk was increasing fast, but the access speed of hard disk but can not grow with each passing hour.When data volume is very large, read operation will spend the longer time.If but parallel reading and writing data from a plurality of disks will be saved a lot of time.

Therefore in order to improve log processing speed, improve some problems that current enterprise runs into aspect data processing, need to be based on distributed storage and calculating, design a procedure, unitized Log Analysis System, this system be take and collected to such an extent that web daily record is data basis, by Data Analyst or other staff, configure index and dimension, system can, according to user's configuration timing operation hind computation program, finally directly show data the personnel that need.This system can be saved the communication cost between party in request and technician, has further improved log analysis work efficiency.

In order to address the above problem, embodiments of the invention provide a kind of blog management method.Hereinafter in connection with accompanying drawing, embodiments of the invention are elaborated.It should be noted that, in the situation that not conflicting, the embodiment in the application and the feature in embodiment be combination in any mutually.

First by reference to the accompanying drawings, embodiments of the invention one are described.

The embodiment of the present invention provides a kind of Log Administration System, and this system is utilized HDFS that Hadoop provides and the technology of two cores of Map/Reduce.In system, user can self-defining configuration data table, has increased like this customizability of form.The structure of this system comprises as shown in Figure 1: log collecting server 101, daily record preprocessing server 102, HDFS file system 103, statistical analysis module 104 and subscriber interface module 105.

As shown in Figure 1, when user is at browsing page time, user's certain operations behavior has all been recorded, and by the log collecting server of front end, is responsible for collecting these records.The log collecting server of front end is not all daily records of storage, and it can the daily record by collecting regularly be pushed to middle daily record processing server.The pretreatment operation of daily record is carried out on intermediate server, and wherein pretreatment operation comprises the unification of cleaning, classification and the form of daily record.What pretreated daily record meeting was timed is stored in the HDFS file system in Hadoop distributed type assemblies, as the input data of the application program of data statistics below.

Below, embodiments of the invention two are described.

The embodiment of the present invention provides a kind of blog management method.For two kinds of related above Log Types, the embodiment of the present invention adopts is that the mode of page cloth code is carried out collector journal.Before carrying out page cloth code, need a JS file, this JS file is mainly realized three functions.

1), when user beats web page, by needed information, comprise that the information such as time, client ip, user profile, reference address, refer address are combined into character string and send front end log server to.

2) realize a method, the method is mainly clicked log services for collecting.Major function is exactly when user clicks on links, triggers the method, and the sign, time, user profile etc. of then user being clicked to short chain send front end log server to.

3) a self-defined html tag attribute, the function that this attribute is realized is similarly with function above, according to actual conditions, selects different modes.

When collecting flow daily record, only the JS file of finishing writing in advance need to be added in html page, use is dynamically written into mode and loads JS file.When carrying out click logs cloth code, first confirm that oneself loads through completing above-mentioned JS file, at needs, dispose and click the html tag of statistics or chain, increase a Custom Attributes, be worth the value={value} for key={key} &.Wherein the value of key and value is can be self-defining when front end is disposed, key value is used for differentiation demand or module, value value is that key value corresponding to value under same module is identical for identifying click label in this module or link.In whole system, key value is unique, and it is unique that value value requires under a key value.

After above-mentioned page cloth code completes, log collecting server is collected flow daily record and click logs, when user opens web page, needed information is combined into character string and sends front-end server to, and described information comprises the arbitrary of following content or multinomial arbitrarily:

Time, client ip, user profile, reference address, refer address.

After collecting daily record, just can carry out pre-service to these daily records.The preprocessing process of flow daily record mainly comprises:

First, according to user's Visitor Logs, calculate the page residence time of each access, the data such as page number of browsing in the access of the level of the page of access and one-time continuous (these data are as the output of daily record preprocessing server, store in HDFS file system, as the input data of statistical analysis module).

Secondly, daily record is connected with user information database according to user's UID information, obtains the user profile of calling party.User information database is the database of preserving all User Details, the UID that only has user in daily record, information in user information database has comprised user's details, such as microblog users, and the microblogging record of have particulars that user registers, location, birthday, sending out etc.The pre-service of click logs need to be carried out click logs and user information database associated, Uniform data format.Concrete, by flow daily record unification, be flow daily record standard format, by click logs unification, be click logs standard format.

To flow daily record standard format, illustrate as follows:

Row number Implication Row number Implication 0 Session id 9 Current URL 1 Conversation index 10 Source domain name 2 Access level 11 The last access time 3 Access time 12 Total degree 4 Session_id 13 Operating system version 5 Cookie_id 14 Browser version 6 Login user classification 15 Flash version 7 Login user ID 16 Language

8 Carry out origin url 17 Access duration time

To click logs standard format, illustrate as follows:

Row number Implication Row number Implication 0 Time 9 The pet name 1 IP 10 State of activation 2 Current URL 11 Mailbox 3 Session_id 12 User type 4 Cookie_id 13 Real Name 5 User ID 14 Label 6 Guideline code 15 Registration source 7 Rank 16 ? 8 Sex 17 ?

Then, can the pretreated daily record of distributed storage.In HDFS file system, store pretreated journal file.From daily record preprocessing server, to HDFS file system, upload before daily record, first the configuration file by Hadoop carries out the configuration of block size and number of copies to pretreated daily record, uploads rear journal file and stores with LZO form.

Data statistic analysis module is mainly that the Map/Reduce framework providing by Hadoop is realized, specific as follows:

1) configuration information of reading out data table, imports respectively in different files by Log Types;

The configuration information of elder generation's reading out data table from database, comprising configuration information and dimension and the dimension values of column information, index.These information are imported respectively in different files by Log Types, and upload and deposit in HDFS, for down-stream.

2) daily record is carried out to Map operation, log processing is become to key-value form, obtain Map result;

(index of daily record is arranged according to user's demand by user each index of every kind of Log Types of searching loop, such as the number of visiting people of homepage etc.), according to the computation rule of each index, (computation rule is defined by the user, user can be according to index, computation rule is set, pageview such as certain user A of statistics, computation rule is written as UID=' A ') daily record is calculated, if a certain log recording meets the computation rule of current index, this recording processing is become to the form of key-value.

The Log Types that the embodiment of the present invention relates to comprises flow daily record and click logs.

Embodiment of the present invention middle finger target compute type comprises 4 kinds: counting type, cumulative type, classified counting type and the cumulative type of grouping, and the data processing of every kind is all different, specific as follows:

Counting type: if the compute type of current index is counting type, the log record that meets current computation rule is turned to key=date+index ID, value=l.

Cumulative type: if the compute type of current index is cumulative type, so the log recording that meets current computation rule is treated to key=date+index ID, the value of value=calculated column.Cumulative key value is identical processing mode with counting, but cumulative value value is not 1, but when carrying out row index allocation, the value of the calculated column of selection.What directly merge so in the back is exactly accumulated value.

Classified counting type: if the compute type of current index is classified counting type, the log record that meets current computation rule is turned to key=date+index ID+ and organize ID, value=l.

The cumulative type of grouping: if the compute type of current index is the cumulative type of dividing into groups, so the log recording that meets current computation rule is treated to key=date+index ID+ and organizes ID, the value of value=calculated column.Cumulative key value is identical processing mode with counting, but cumulative value value is not 1, but when carrying out row index allocation, the value of the calculated column of selection.The result merging so is in the back exactly the accumulated value of grouping.

The daily record of this step output key-value form, i.e. daily record has a key value and a value value.

3) described Map result is carried out to union operation, obtain amalgamation result;

The embodiment of the present invention has increased Combiner to carry out Combine operation between Mapper and Reducer, and Combiner is mainly in order to alleviate the load of Reducer, improves program operation speed.The operation of Combiner is that local Map result is further merged, and the result after processing is as the input data of Reducer.As counting or when cumulative, Combiner operation meeting merges the result of local Map output, and the result after processing is that each key value (being that each key-value combines) only has a record, thereby has reduced the work of Reducer.

4) described merging is carried out to Reduce operation, obtain data results, this data results is deposited in described HDFS file system;

After processing through Map and Combine, data are stored in the file system of statistical analysis module this locality.Carry out Reducer when operation, need to be from different Map reading out data, the daily record that in different Map, key value is identical will be processed in same Reducer.Which Reducer concrete daily record corresponding to each key value is assigned to is processed, and by key value is carried out to Hash realization, the record that cryptographic hash is identical is assigned to the upper processing of same Reducer.Counting is identical while operating with cumulative Reducer, is all the value value summation corresponding to identical key value.After Reducer has operated, result data is stored in HDFS, finally from HDFS by data importing database, for user inquiry.

Subscriber interface module is mainly the statistics data in reading database, is then illustrated in the front end page, and ways of presentation has tables of data, pie chart, broken line graph and histogram.

The embodiment of the present invention provides a kind of blog management method and system, first, under test environment, the selected a certain page is carried out to cloth code, adds required js file.Then collect user access logs.To collect again to such an extent that log store is a in Hadoop cluster, independently a in server.User interface in system is configured tables of data and index again.Then adopt two kinds of distinct methods of hadoop and shell script to calculate.Finally by the system page, check the displaying of the chart of data.Collect flow daily record and click logs, to collecting the daily record obtaining, carry out pre-service, the pretreated daily record of distributed storage.Realize the journal file distributed storage based on HDFS framework, solved the problem of distributed storage management daily record data.

The all or part of step that one of ordinary skill in the art will appreciate that above-described embodiment can realize by computer program flow process, described computer program can be stored in a computer-readable recording medium, described computer program (as system, unit, device etc.) on corresponding hardware platform is carried out, when carrying out, comprise step of embodiment of the method one or a combination set of.

Alternatively, all or part of step of above-described embodiment also can realize with integrated circuit, and these steps can be made into respectively integrated circuit modules one by one, or a plurality of modules in them or step are made into single integrated circuit module realize.Like this, the present invention is not restricted to any specific hardware and software combination.

Each device/functional module/functional unit in above-described embodiment can adopt general calculation element to realize, and they can concentrate on single calculation element, also can be distributed on the network that a plurality of calculation elements form.

The form of software function module of usining each device/functional module/functional unit in above-described embodiment realizes and during as production marketing independently or use, can be stored in a computer read/write memory medium.The above-mentioned computer read/write memory medium of mentioning can be ROM (read-only memory), disk or CD etc.

Anyly be familiar with those skilled in the art in the technical scope that the present invention discloses, can expect easily changing or replacing, within all should being encompassed in protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain described in claim.

Claims (9)

1. a blog management method, is characterized in that, comprising:
Collect flow daily record and click logs;
To collecting the daily record obtaining, carry out pre-service;
The pretreated daily record of distributed storage.
2. blog management method according to claim 1, is characterized in that, collects flow daily record and click logs and comprises:
When user opens web page, needed information is combined into character string and sends front-end server to, described information comprises the arbitrary of following content or multinomial arbitrarily:
Time, client ip, user profile, reference address, refer address.
3. blog management method according to claim 1, is characterized in that, the daily record that collection is obtained is carried out pre-service and comprised:
The page number that calculates the page residence time of each access, browses in the level of the page of access and one-time continuous access;
The daily record of collecting is connected with user information database according to user's UID information, obtains the user profile of calling party, described user information database is preserved all User Details;
The flow daily record of collecting is converted into flow daily record standard format, the click logs of collecting is converted into click logs standard format.
4. blog management method according to claim 3, is characterized in that, the pretreated daily record of distributed storage comprises:
Described pretreated daily record is carried out to the configuration of block size and number of copies;
To HDFS file system, upload described pretreated daily record, with LZO form, be stored in described HDFS file system.
5. blog management method according to claim 4, is characterized in that, the method also comprises:
The configuration information of reading out data table, imports respectively in different files by Log Types;
Daily record is carried out to Map operation, log processing is become to key-value form, obtain Map result;
Described Map result is carried out to union operation, obtain amalgamation result;
Described merging is carried out to Reduce operation, obtain data results, this data results is deposited in described HDFS file system;
From described HDFS file system, described data results is imported to database, for user's inquiry.
6. blog management method according to claim 5, is characterized in that, the configuration information of reading out data table imports respectively different files by Log Types and comprises:
The configuration information of reading out data table from database, described configuration information comprises configuration information, dimension and the dimension values of column information, index;
Described configuration information is imported respectively in different files by Log Types, and upload in HDFS.
7. blog management method according to claim 6, is characterized in that, daily record is carried out to Map operation, and log processing is become to key-value form, obtains Map result and comprises:
Each index of every kind of Log Types of searching loop;
According to the computation rule of each preset index, select applicable compute type to carry out daily record, will is processed into the form of key-value, using the daily record of this key-value form as Map result.
8. blog management method according to claim 7, is characterized in that, described compute type comprises:
Counting type, turns to key=date+index ID, value=l by the journal format that meets current computation rule;
Cumulative type, is key=date+index ID by the log processing that meets current computation rule, the value of value=calculated column;
Classified counting type, turns to key=date+index ID+ by the log record that meets current computation rule and organizes ID, value=l;
The cumulative type of grouping is that key=date+index ID+ organizes ID, the value of value=calculated column by the log processing that meets current computation rule.
9. blog management method according to claim 5, is characterized in that, described Map result is carried out to union operation, obtains amalgamation result and comprises:
Map result is incorporated into the only corresponding daily record of each key-value value.
CN201410409927.0A 2014-08-19 2014-08-19 Log management method CN104182506A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410409927.0A CN104182506A (en) 2014-08-19 2014-08-19 Log management method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410409927.0A CN104182506A (en) 2014-08-19 2014-08-19 Log management method

Publications (1)

Publication Number Publication Date
CN104182506A true CN104182506A (en) 2014-12-03

Family

ID=51963545

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410409927.0A CN104182506A (en) 2014-08-19 2014-08-19 Log management method

Country Status (1)

Country Link
CN (1) CN104182506A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105099764A (en) * 2015-06-29 2015-11-25 百度在线网络技术(北京)有限公司 Log processing method and log processing device
CN105468737A (en) * 2015-11-24 2016-04-06 湖北大学 Web service big data analysis method, cloud computing platform and mining system
CN105577431A (en) * 2015-12-11 2016-05-11 青岛云成互动网络有限公司 User information identification and classification method based on internet application and system thereof
CN105574539A (en) * 2015-12-11 2016-05-11 中国联合网络通信集团有限公司 DNS log analysis method and apparatus
CN105608203A (en) * 2015-12-24 2016-05-25 Tcl集团股份有限公司 Internet of things log processing method and device based on Hadoop platform
CN105808605A (en) * 2014-12-31 2016-07-27 北京奇虎科技有限公司 Search log combination method and system
CN105843941A (en) * 2016-04-01 2016-08-10 北京小米移动软件有限公司 Log checking method and device
CN106227877A (en) * 2016-08-02 2016-12-14 北京集奥聚合科技有限公司 A kind of distributed information log acquisition system based on hadoop and method
CN106503079A (en) * 2016-10-10 2017-03-15 语联网(武汉)信息技术有限公司 A kind of blog management method and system
CN106776622A (en) * 2015-11-20 2017-05-31 北京国双科技有限公司 The querying method and device of access log
CN107153702A (en) * 2017-05-10 2017-09-12 北京微影时代科技有限公司 A kind of data processing method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102693307A (en) * 2012-05-24 2012-09-26 上海克而瑞信息技术有限公司 Website user access behavior recording and analyzing system
US20130124466A1 (en) * 2011-11-14 2013-05-16 Siddartha Naidu Data Processing Service

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130124466A1 (en) * 2011-11-14 2013-05-16 Siddartha Naidu Data Processing Service
CN102693307A (en) * 2012-05-24 2012-09-26 上海克而瑞信息技术有限公司 Website user access behavior recording and analyzing system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SIJIE GUO等: "Mastiff: A MapReduce-based System for Time-based Big Data Analytics", 《2012 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING》 *
王高垒: "爬虫日志数据信息抽取与统计系统设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105808605B (en) * 2014-12-31 2019-08-09 北京奇虎科技有限公司 A kind of search log merging method and system
CN105808605A (en) * 2014-12-31 2016-07-27 北京奇虎科技有限公司 Search log combination method and system
CN105099764B (en) * 2015-06-29 2019-01-18 北京音之邦文化科技有限公司 Log processing method and device
CN105099764A (en) * 2015-06-29 2015-11-25 百度在线网络技术(北京)有限公司 Log processing method and log processing device
CN106776622A (en) * 2015-11-20 2017-05-31 北京国双科技有限公司 The querying method and device of access log
CN105468737A (en) * 2015-11-24 2016-04-06 湖北大学 Web service big data analysis method, cloud computing platform and mining system
CN105574539B (en) * 2015-12-11 2018-09-21 中国联合网络通信集团有限公司 A kind of DNS log analysis methods and device
CN105577431A (en) * 2015-12-11 2016-05-11 青岛云成互动网络有限公司 User information identification and classification method based on internet application and system thereof
CN105574539A (en) * 2015-12-11 2016-05-11 中国联合网络通信集团有限公司 DNS log analysis method and apparatus
CN105608203A (en) * 2015-12-24 2016-05-25 Tcl集团股份有限公司 Internet of things log processing method and device based on Hadoop platform
CN105608203B (en) * 2015-12-24 2019-09-17 Tcl集团股份有限公司 A kind of Internet of Things log processing method and device based on Hadoop platform
CN105843941B (en) * 2016-04-01 2019-07-09 北京小米移动软件有限公司 Log method of calibration and device
CN105843941A (en) * 2016-04-01 2016-08-10 北京小米移动软件有限公司 Log checking method and device
CN106227877A (en) * 2016-08-02 2016-12-14 北京集奥聚合科技有限公司 A kind of distributed information log acquisition system based on hadoop and method
CN106503079A (en) * 2016-10-10 2017-03-15 语联网(武汉)信息技术有限公司 A kind of blog management method and system
CN107153702A (en) * 2017-05-10 2017-09-12 北京微影时代科技有限公司 A kind of data processing method and device

Similar Documents

Publication Publication Date Title
Lim et al. Business intelligence and analytics: Research directions
US10027773B2 (en) Methods and apparatus to share online media impressions data
JP5778255B2 (en) Method, system, and apparatus for query based on vertical search
CN102682059B (en) Method and system for distributing users to clusters
EP2088711B1 (en) A log analyzing method and system based on distributed compute network
US20160364425A1 (en) Hierarchical diff files
JP5596152B2 (en) Information matching method and system on electronic commerce website
Das et al. Creating meaningful data from web logs for improving the impressiveness of a website by using path analysis method
US9280531B2 (en) Marketing to consumers using data obtained from abandoned electronic forms
US8972275B2 (en) Optimization of social media engagement
US9070137B2 (en) Methods and systems for compiling marketing information for a client
US20100223244A1 (en) Targeted multi-dimension data extraction for real-time analysis
US8473495B2 (en) Centralized web-based software solution for search engine optimization
US20140278575A1 (en) Systems And Methods Of Processing Insurance Data Using A Web-Scale Data Fabric
US9619525B2 (en) Method and system of optimizing a web page for search engines
US10366146B2 (en) Method for adjusting content of a webpage in real time based on users online behavior and profile
CN104036025A (en) Distribution-base mass log collection system
JP6388655B2 (en) Generation of multi-column index of relational database by data bit interleaving for selectivity
CN104820670A (en) Method for acquiring and storing big data of power information
CN102117321A (en) Automated discovery aggregation and organization of subject area discussions
WO2011146391A2 (en) Data collection, tracking, and analysis for multiple media including impact analysis and influence tracking
Nandimath et al. Big data analysis using Apache Hadoop
US8667385B1 (en) Method and system for generating and sharing analytics annotations
US9471436B2 (en) Use of incremental checkpoints to restore user data stream processes
US20130159251A1 (en) Dedicating Disks to Reading or Writing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20141203