CN105677842A - Log analysis system based on Hadoop big data processing technique - Google Patents

Log analysis system based on Hadoop big data processing technique Download PDF

Info

Publication number
CN105677842A
CN105677842A CN201610006805.6A CN201610006805A CN105677842A CN 105677842 A CN105677842 A CN 105677842A CN 201610006805 A CN201610006805 A CN 201610006805A CN 105677842 A CN105677842 A CN 105677842A
Authority
CN
China
Prior art keywords
data
module
file system
distributed file
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610006805.6A
Other languages
Chinese (zh)
Inventor
许丹霞
刘寅
汪伟
郑宇�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huishang Rongtong Information Technology Co Ltd
Original Assignee
Beijing Huishang Rongtong Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huishang Rongtong Information Technology Co Ltd filed Critical Beijing Huishang Rongtong Information Technology Co Ltd
Priority to CN201610006805.6A priority Critical patent/CN105677842A/en
Publication of CN105677842A publication Critical patent/CN105677842A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs

Abstract

The invention discloses an enterprise website log analysis system developed on the basis of an Hadoop platform. The system mainly comprises a file uploading module, a data cleaning module, a data statistical analysis module, a data export module and a data exhibition module. Accordingly, the website key indexes such as the page view (PV), the registered user number, the ip number and the bounce rate can be obtained through calculation, and data exhibition can achieve millisecond query of mass data.

Description

Log Analysis System based on the big data processing technique of Hadoop
Technical field
The present invention relates to log analysis technology, particularly relate to a kind of log analysis technology based on the big data processing technique of Hadoop.
Background technology
Today, we live in data age, by various packets round. This is the epoch of an information explosion, whole world phone in units of hundred million, Internet user are constantly be generated mass data every day, make a phone call between people, send short messages, chat on line, uploaded videos, forwarding microblogging etc., information with the speed increment of geometry level every day so that market Shang Ge great Internet firm all suffers from stern challenge. They need the TB even analysis of PB DBMS, excavate the merchandise news that sales volume is high, the space of a whole page that website pouplarity is high, the advertisement etc. that on website, click volume is high, and the data of such scale just can only be deeply aware of one's own helplessness when faced with a great task by traditional solution and method.
Increase income the birth of big data processing platform (DPP) Hadoop under organization's Apache foundation, breach the bottleneck of traditional data processing mode so that the collection of mass data, storage, calculating become to be more prone to, more efficient. Hadoop system is a distributed data storage and the platform processed, can be embodied on cheap computer cluster, provide the framework of a mass data distributed storage and calculating, file system HDFS and Computational frame MapReduce, the Large Copacity space storage mass data and the cluster total score that enable users to make full use of cluster are total (namely total: data collection merging; Point: distributed storage and calculating; Total: result of calculation merge) high-speed computational capability develop distributed application program, it is achieved the Millisecond high speed processing of mass data. Owing to this platform adopts OO programming language written in Java, therefore it has well portable and extensibility. It is developed so far, has expanded some outstanding frameworks, framework such as Flume, ZooKeeper, HBase, Pig, Hive, Sqoop etc. that the comparison of enterprise is many, it is achieved that the encapsulation of some service logics, simplify the use of Hadoop.
Traditional data processing mode memory space and operational capability are limited, such as, run tradition APP on one computer, data volume only about 3,000, operation is also required to about general half an hour, and the utilization rate of CPU can reach about 85%, if computer hardware configuration is lower, then can run the longer time, and must artificially collect and process data, clean data, expend substantial amounts of manpower and materials, and it is extremely inefficient, so prior art is difficult to meet the demand of big data quantity, efficiency must be improved by every means, more advanced technology is used to solve the process of mass data.
Summary of the invention
For traditional data processing mode, the data collected are placed in relevant database, there is various association between data, even produce data dependence, and data process in single computer, it is subject to the interference of the various factors such as the configuration of computer, network and affects the efficiency that data process.
The present invention is based on the enterprise web site log analysis solution of Hadoop platform exploitation, is broadly divided into five modules, is files passe module, data cleansing module, data statistic analysis module, data derivation module, data exhibiting module respectively. Files passe uses Flume framework, data cleansing uses MapReduce core algorithm, the statistical analysis of data uses Hive framework, can calculate and obtain each big key index in website, such as pageview PV, registration number of users, ip number, jump out rate, for network operator's decision-making, the derivation of data uses SQOOP framework, each index obtained being exported in the relevant database MySql outside cluster, representing of data uses ZooKeeper and HBase framework, it is possible to achieve the Millisecond inquiry of mass data.
For realizing the purpose of the present invention, it is achieved by the following technical solutions:
A kind of Log Analysis System, including: files passe module, data cleansing module, data statistic analysis module, data derive module and data display module, wherein
Files passe module, is used for uploading journal file, and first files passe module gathers journal file, afterwards journal file is uploaded to distributed file system;
Data cleansing module, for the log file data in distributed file system is carried out conversion, cleans the data after conversion and leaves in distributed file system;
Data statistic analysis module, for the journal file in distributed file system is carried out statistical analysis by data, obtains the statistical data needed, is left in by statistical data in distributed file system;
Data derive module, for the data of storage in distributed file system are exported in the data base of outside;
Data exhibiting module, for the data of storage in outside data base are inquired about, and shows Query Result.
Described Log Analysis System, it is preferred that:
Described distributed file system is HDFS;
Described journal file is the journal file of application cluster.
Described Log Analysis System, it is preferred that:
Data cleansing includes checking data consistency, processes invalid value and missing values.
Described Log Analysis System, it is preferred that:
Described statistical data includes PV, registration number of users, independent ip number, jumps out rate.
Described Log Analysis System, it is preferred that:
The data base of described outside is Mysql data base.
A kind of log analysis method, comprises the following steps:
Step 1. files passe: first gather journal file, afterwards journal file is uploaded to distributed file system;
Step 2. data cleansing: the log file data in distributed file system is carried out conversion, cleans the data after conversion and leaves in distributed file system;
Step 3. data statistic analysis, carries out statistical analysis to the log file data in distributed file system, obtains the statistical data needed, is left in distributed file system by statistical data.
Step 4. data derive: the data of storage in distributed file system are exported in the data base of outside.
Step 5. data exhibiting: the data of storage in outside data base are inquired about, and shows Query Result.
Described log analysis method, it is preferred that:
Described distributed file system is HDFS;
Described journal file is the journal file of application cluster.
Described log analysis method, it is preferred that:
Data cleansing includes checking data consistency, processes invalid value and missing values.
Described log analysis method, it is preferred that:
Described statistical data includes PV, registration number of users, independent ip number, jumps out rate.
Described log analysis method, it is preferred that:
The data base of described outside is Mysql data base.
The building method of a kind of Log Analysis System, comprises the following steps:
The first step: build distributed type assemblies platform, including following four node:
Metadata node, from metadata node, back end 1, back end 2;
Second step: build required data framework on cluster;
3rd step: create log folder under the root of the Linux system of above four kinds of nodes, is used for depositing journal file and performs order, start cluster;
4th step: create Webpage log file under the root in distributed file system, remote procedure call protocol is passed through by log collection module and cluster) communication interaction, log collection task is allowed to run with background process, monitoring log folder, once file collects journal file, with regard under the Webpage log file in synchronized upload to distributed document;
5th step: data are uploaded after successfully, carries out data cleansing by starting cleaning module; After data cleansing, check file system by the form of webpage in browser end access, view desired data;
6th step: use data statistic analysis module after having cleaned) data are carried out statistical analysis, create external table and quote the data under Webpage log file, including:
Calculate pageview PV, statistic PV;
Calculate registration number of users;
Calculate independent ip number;
Number is jumped out in calculating;
7th step: each statistic obtained is stored in respectively in the table of correspondence, then in the data summarization in each table a to table;
8th step: use data to derive in the relevant database that module exports to the data collected outside, it is achieved the quick search of data.
Accompanying drawing explanation
Fig. 1 is Log Analysis System schematic diagram of the present invention;
Fig. 2 is log analysis method schematic diagram of the present invention.
Detailed description of the invention
As it is shown in figure 1, Log Analysis System of the present invention includes: files passe module, data cleansing module, data statistic analysis module, data derive module and data display module.
Files passe module, is used for uploading journal file, and first files passe module gathers journal file, afterwards journal file is uploaded to distributed file system, such as HDFS file system. Described journal file is the journal file of application cluster.
Data cleansing module, for the log file data in HDFS is carried out conversion, cleans the data after conversion and is placed in HDFS. Data cleansing includes checking data consistency, processes invalid value and missing values etc. Filtering undesirable data, undesirable data are mainly the data three major types of incomplete data, the data of mistake, repetition.
Data statistic analysis module: for the log file data in HDFS is carried out statistical analysis, obtains the statistical data needed, and such as PV (page browsing amount), registration number of users, independent ip number, jumps out rate etc., is left in HDFS by statistical data.
Data derive module: for the data obtained of storage in HDFS are exported in the MySql data base of outside.
Data exhibiting functional module: for the mass data of storage in Mysql data base is carried out Millisecond inquiry, and show Query Result.
Such as Fig. 2, log analysis method of the present invention includes: files passe, data cleansing, data statistic analysis, data derive and data display.
Step 1. files passe, is used for uploading journal file. First gather journal file, afterwards journal file is uploaded to distributed file system, such as HDFS file system. Described journal file is the journal file of application cluster.
Step 2. data cleansing, is carried out conversion to the log file data in HDFS, cleans the data after conversion and is placed in HDFS. Data cleansing includes checking data consistency, processes invalid value and missing values etc. Filtering undesirable data, undesirable data mainly have the data three major types of incomplete data, the data of mistake, repetition.
Step 3. data statistic analysis, carries out statistical analysis to the log file data in HDFS, obtains the statistical data needed, and such as PV (page browsing amount), registration number of users, independent ip number, jumps out rate etc., is left in HDFS by statistical data.
Step 4. data derive, and the data result obtained of storage in HDFS is exported in the MySql data base of outside.
Step 5. data exhibiting, the Millisecond inquiry that the mass data of storage in Mysql data base is carried out, and show Query Result.
The building method of one Log Analysis System of the present invention (particularly a kind of Log Analysis System based on the big data processing technique of Hadoop) comprises the following steps:
The first step: build distributed type assemblies platform (such as Hadoop cluster). Following four node can be included:
Server1 (Master) NameNode, JobTracker: metadata node
Server2 (secondnamenode) SecondaryNameNode: from metadata node
Server3 (slave01) DataNode, TaskTracker: back end
Server4 (slave02) DataNode, TaskTracker: back end
Second step: build required data framework on cluster, such as HBase, Zookeeper etc. First start Hadoop distributed type assemblies, then start ZooKeeper cluster, finally at Master (metadata node) upper startup HBase cluster.
3rd step: create log folder (such as apache_logs) under the root of the Linux system of above four kinds of nodes, is used for depositing journal file and performs order, start cluster.
4th step: create web_logs (Webpage log) file under the HDFS root in HDFS file system, by log collection module (such as Flume) with cluster by RPC (remote procedure call protocol) communication interaction, log collection task is allowed to run with background process, monitoring apache_logs file, once file collects journal file, just it is synchronized in HDFS under web_logs file.
5th step: data are uploaded after successfully, it is possible to carry out data cleansing by starting cleaning module. After data cleansing, it is possible to check file system by the form of webpage in browser end access, view desired data.
6th step: use data statistic analysis module (such as Hive) that data carry out statistical analysis after having cleaned, create external table and quote the data under web_logs, including:
Calculate pageview PV, statistic PV;
Calculate registration number of users;
Calculate independent ip number;
Number is jumped out in calculating;
7th step: each statistic obtained is stored in respectively in the table of correspondence.Then in the data summarization in each table a to table.
8th step: use data to derive in the relevant database MySql that module (such as sqoop) exports to the data collected outside, use HBase to realize the quick search of data.
The present invention breaches the bottleneck of traditional data processing mode so that the collection of mass data, storage, calculating become to be more prone to, more efficient. present invention utilizes the high efficiency of the increasing income property of Hadoop technology and parallel processing, cluster is without expensive minicomputer, only need common computer just can build the cluster of superior performance, make full use of the resource of each computer node, with low cost, technology maturation is stable, so building the Log Analysis System based on Hadoop cluster is great meaning, not only greatly reduce various expense, and the requirement of developer is also very low, one cluster even only needs a developer to be responsible for exploitation with the running safeguarding cluster, and substantial amounts of data can be processed timely, make the collection of mass data, storage, calculating becomes to be more prone to, more efficient, improve work efficiency.

Claims (10)

1. a Log Analysis System, it is characterised in that include files passe module, data cleansing module, data statistic analysis module, data derivation module and data display module;
Wherein:
Files passe module, is used for uploading journal file, and first files passe module gathers journal file, afterwards journal file is uploaded to distributed file system;
Data cleansing module, for the log file data in distributed file system is carried out conversion, cleans the data after conversion and leaves in distributed file system;
Data statistic analysis module, for the journal file in distributed file system is carried out statistical analysis, obtains the statistical data needed, is left in by statistical data in distributed file system;
Data derive module, for the data obtained of storage in distributed file system are exported in the data base of outside;
Data exhibiting module, for the data of storage in outside data base are inquired about, and shows Query Result.
2. Log Analysis System according to claim 1, it is characterised in that:
Described distributed file system is HDFS;
Described journal file is the journal file of application cluster.
3. Log Analysis System according to claim 1, it is characterised in that:
Data cleansing includes checking data consistency, processes invalid value and missing values.
4. Log Analysis System according to claim 1, it is characterised in that:
Described statistical data includes PV, registration number of users, independent ip number, jumps out rate.
5. Log Analysis System according to claim 1, it is characterised in that:
The data base of described outside is Mysql data base.
6. a log analysis method, it is characterised in that comprise the following steps:
Step 1. files passe: first gather journal file, afterwards journal file is uploaded to distributed file system;
Step 2. data cleansing: the log file data in distributed file system is carried out conversion, cleans the data after conversion and leaves in distributed file system;
Step 3. data statistic analysis, carries out statistical analysis to the log file data in distributed file system, obtains the statistical data needed, is left in by statistical data in distributed file system;
Step 4. data derive: the data obtained of storage in distributed file system are exported in the data base of outside;
Step 5. data exhibiting: the data of storage in outside data base are inquired about, and shows Query Result.
7. log analysis method according to claim 6, it is characterised in that:
Described distributed file system is HDFS;
Described journal file is the journal file of application cluster.
8. log analysis method according to claim 6, it is characterised in that:
Data cleansing includes checking data consistency, processes invalid value and missing values.
9. log analysis method according to claim 6, it is characterised in that:
Described statistical data includes PV, registration number of users, independent ip number, jumps out rate.
10. log analysis method according to claim 6, it is characterised in that:
The data base of described outside is Mysql data base.
CN201610006805.6A 2016-01-05 2016-01-05 Log analysis system based on Hadoop big data processing technique Pending CN105677842A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610006805.6A CN105677842A (en) 2016-01-05 2016-01-05 Log analysis system based on Hadoop big data processing technique

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610006805.6A CN105677842A (en) 2016-01-05 2016-01-05 Log analysis system based on Hadoop big data processing technique

Publications (1)

Publication Number Publication Date
CN105677842A true CN105677842A (en) 2016-06-15

Family

ID=56299098

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610006805.6A Pending CN105677842A (en) 2016-01-05 2016-01-05 Log analysis system based on Hadoop big data processing technique

Country Status (1)

Country Link
CN (1) CN105677842A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106301886A (en) * 2016-07-22 2017-01-04 天脉聚源(北京)传媒科技有限公司 A kind of user operation auditing method and device
CN106354772A (en) * 2016-08-23 2017-01-25 成都卡莱博尔信息技术股份有限公司 Mass data system with data cleaning function
CN106570152A (en) * 2016-10-28 2017-04-19 上海斐讯数据通信技术有限公司 Mobile phone number volume extracting method and system
CN106570153A (en) * 2016-10-28 2017-04-19 上海斐讯数据通信技术有限公司 Data extraction method and system for mass URLs
CN106682125A (en) * 2016-12-13 2017-05-17 四川长虹电器股份有限公司 Method for analyzing user retention rate of smart television
CN106709003A (en) * 2016-12-23 2017-05-24 长沙理工大学 Hadoop-based mass log data processing method
CN106709029A (en) * 2016-12-28 2017-05-24 上海斐讯数据通信技术有限公司 File hierarchical processing method and processing system based on Hadoop and MySQL
CN106777046A (en) * 2016-12-09 2017-05-31 武汉卓尔云市集团有限公司 A kind of data analysing method based on nginx daily records
CN106850763A (en) * 2017-01-04 2017-06-13 千寻位置网络有限公司 Data distribution formula is received and analysis method and system
CN106933720A (en) * 2017-01-16 2017-07-07 国家电网公司 Network log information security scene-type analysis system and its analysis method
CN106951497A (en) * 2017-03-15 2017-07-14 深圳市德信软件有限公司 A kind of method and system based on Hadoop framework data analysis diagrammatic representation
CN107169084A (en) * 2017-05-11 2017-09-15 深圳市茁壮网络股份有限公司 A kind of data processing method, distributed file system and data server
CN107193903A (en) * 2017-05-11 2017-09-22 上海斐讯数据通信技术有限公司 The method and system of efficient process IP address zone location
CN107562796A (en) * 2017-08-02 2018-01-09 上海斐讯数据通信技术有限公司 A kind of magnanimity mobile terminal measures statistical method and device online
CN107786641A (en) * 2017-09-30 2018-03-09 南威软件股份有限公司 A kind of acquisition method of distributed multi-system user user behaviors log
CN107818120A (en) * 2016-09-14 2018-03-20 博雅网络游戏开发(深圳)有限公司 Data processing method and device based on big data
CN109064232A (en) * 2018-08-16 2018-12-21 安徽大尺度网络传媒有限公司 A kind of big data processing method and processing device promoted for internet
CN109496302A (en) * 2018-05-31 2019-03-19 优视科技新加坡有限公司 A kind of user's characteristic information collection method, device and equipment/terminal/server
CN110032560A (en) * 2018-11-06 2019-07-19 阿里巴巴集团控股有限公司 A kind of generation method and device monitoring chart
CN110098957A (en) * 2019-04-04 2019-08-06 北京市天元网络技术股份有限公司 Big data analysis system based on network log
CN110321329A (en) * 2019-06-18 2019-10-11 中盈优创资讯科技有限公司 Data processing method and device based on big data
CN112131209A (en) * 2020-09-04 2020-12-25 苏州浪潮智能科技有限公司 Hive-based Flume data verification statistical method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103399887A (en) * 2013-07-19 2013-11-20 蓝盾信息安全技术股份有限公司 Query and statistical analysis system for mass logs
CN104111996A (en) * 2014-07-07 2014-10-22 山大地纬软件股份有限公司 Health insurance outpatient clinic big data extraction system and method based on hadoop platform
CN104298771A (en) * 2014-10-30 2015-01-21 南京信息工程大学 Massive web log data query and analysis method
CN104714946A (en) * 2013-12-11 2015-06-17 田鹏 Large-scale Web log analysis system based on NoSQL

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103399887A (en) * 2013-07-19 2013-11-20 蓝盾信息安全技术股份有限公司 Query and statistical analysis system for mass logs
CN104714946A (en) * 2013-12-11 2015-06-17 田鹏 Large-scale Web log analysis system based on NoSQL
CN104111996A (en) * 2014-07-07 2014-10-22 山大地纬软件股份有限公司 Health insurance outpatient clinic big data extraction system and method based on hadoop platform
CN104298771A (en) * 2014-10-30 2015-01-21 南京信息工程大学 Massive web log data query and analysis method

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106301886A (en) * 2016-07-22 2017-01-04 天脉聚源(北京)传媒科技有限公司 A kind of user operation auditing method and device
CN106354772A (en) * 2016-08-23 2017-01-25 成都卡莱博尔信息技术股份有限公司 Mass data system with data cleaning function
CN107818120A (en) * 2016-09-14 2018-03-20 博雅网络游戏开发(深圳)有限公司 Data processing method and device based on big data
CN107818120B (en) * 2016-09-14 2020-05-29 博雅网络游戏开发(深圳)有限公司 Data processing method and device based on big data
CN106570152B (en) * 2016-10-28 2020-12-22 金华市智甄通信设备有限公司 Mass extraction method and system for mobile phone numbers
CN106570152A (en) * 2016-10-28 2017-04-19 上海斐讯数据通信技术有限公司 Mobile phone number volume extracting method and system
CN106570153A (en) * 2016-10-28 2017-04-19 上海斐讯数据通信技术有限公司 Data extraction method and system for mass URLs
CN106777046A (en) * 2016-12-09 2017-05-31 武汉卓尔云市集团有限公司 A kind of data analysing method based on nginx daily records
CN106682125A (en) * 2016-12-13 2017-05-17 四川长虹电器股份有限公司 Method for analyzing user retention rate of smart television
CN106709003A (en) * 2016-12-23 2017-05-24 长沙理工大学 Hadoop-based mass log data processing method
CN106709029A (en) * 2016-12-28 2017-05-24 上海斐讯数据通信技术有限公司 File hierarchical processing method and processing system based on Hadoop and MySQL
CN106850763A (en) * 2017-01-04 2017-06-13 千寻位置网络有限公司 Data distribution formula is received and analysis method and system
CN106933720A (en) * 2017-01-16 2017-07-07 国家电网公司 Network log information security scene-type analysis system and its analysis method
CN106951497A (en) * 2017-03-15 2017-07-14 深圳市德信软件有限公司 A kind of method and system based on Hadoop framework data analysis diagrammatic representation
CN107193903A (en) * 2017-05-11 2017-09-22 上海斐讯数据通信技术有限公司 The method and system of efficient process IP address zone location
CN107169084A (en) * 2017-05-11 2017-09-15 深圳市茁壮网络股份有限公司 A kind of data processing method, distributed file system and data server
CN107562796A (en) * 2017-08-02 2018-01-09 上海斐讯数据通信技术有限公司 A kind of magnanimity mobile terminal measures statistical method and device online
CN107786641A (en) * 2017-09-30 2018-03-09 南威软件股份有限公司 A kind of acquisition method of distributed multi-system user user behaviors log
CN109496302A (en) * 2018-05-31 2019-03-19 优视科技新加坡有限公司 A kind of user's characteristic information collection method, device and equipment/terminal/server
CN109064232A (en) * 2018-08-16 2018-12-21 安徽大尺度网络传媒有限公司 A kind of big data processing method and processing device promoted for internet
CN110032560A (en) * 2018-11-06 2019-07-19 阿里巴巴集团控股有限公司 A kind of generation method and device monitoring chart
CN110098957A (en) * 2019-04-04 2019-08-06 北京市天元网络技术股份有限公司 Big data analysis system based on network log
CN110321329A (en) * 2019-06-18 2019-10-11 中盈优创资讯科技有限公司 Data processing method and device based on big data
CN112131209A (en) * 2020-09-04 2020-12-25 苏州浪潮智能科技有限公司 Hive-based Flume data verification statistical method and device

Similar Documents

Publication Publication Date Title
CN105677842A (en) Log analysis system based on Hadoop big data processing technique
CN106293892B (en) Distributed stream computing system, method and apparatus
CN111241078A (en) Data analysis system, data analysis method and device
CN104077402B (en) Data processing method and data handling system
CN106982150B (en) Hadoop-based mobile internet user behavior analysis method
CN106778253A (en) Threat context aware information security Initiative Defense model based on big data
CN107070890A (en) Flow data processing device and communication network major clique system in a kind of communication network major clique system
CN104426713A (en) Method and device for monitoring network site access effect data
CN111435344A (en) Big data-based drilling acceleration influence factor analysis model
CN102946320B (en) Distributed supervision method and system for user behavior log forecasting network
CN104394211A (en) Design and implementation method for user behavior analysis system based on Hadoop
CN108959445A (en) Distributed information log processing method and processing device
Jeong et al. Anomaly teletraffic intrusion detection systems on hadoop-based platforms: A survey of some problems and solutions
CN105468737A (en) Web service big data analysis method, cloud computing platform and mining system
CN106992886A (en) A kind of log analysis method and device based on distributed storage
CN110968571A (en) Big data analysis and processing platform for financial information service
CN104298669A (en) Person geographic information mining model based on social network
CN108268569A (en) The acquisition of water resource monitoring data and analysis system and method based on big data technology
CN111126852A (en) BI application system based on big data modeling
Kim et al. Implementation of hybrid P2P networking distributed web crawler using AWS for smart work news big data
CN114637903A (en) Public opinion data acquisition system for directional target data expansion
CN113721856A (en) Digital community management data storage system
You et al. SNES: Social-Network-Oriented Public Opinion Monitoring Platform Based on ElasticSearch.
CN106570151A (en) Data collection processing method and system for mass files
Maske et al. A real time processing and streaming of wireless network data using storm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160615

RJ01 Rejection of invention patent application after publication