CN106951552A - A kind of user behavior data processing method based on Hadoop - Google Patents

A kind of user behavior data processing method based on Hadoop Download PDF

Info

Publication number
CN106951552A
CN106951552A CN201710191813.7A CN201710191813A CN106951552A CN 106951552 A CN106951552 A CN 106951552A CN 201710191813 A CN201710191813 A CN 201710191813A CN 106951552 A CN106951552 A CN 106951552A
Authority
CN
China
Prior art keywords
data
user
real
real time
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710191813.7A
Other languages
Chinese (zh)
Inventor
陈粤龙
陈敏俊
温亮生
张治中
赵瑞莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
China Mobile Hangzhou Information Technology Co Ltd
Original Assignee
Chongqing University of Post and Telecommunications
China Mobile Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications, China Mobile Hangzhou Information Technology Co Ltd filed Critical Chongqing University of Post and Telecommunications
Priority to CN201710191813.7A priority Critical patent/CN106951552A/en
Publication of CN106951552A publication Critical patent/CN106951552A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The present invention relates to a kind of user behavior data processing method based on Hadoop, methods described includes:User's history data source is imported into distributed file system HDFS;The historical behavior tables of data of user is generated based on the user's history data source;The real-time behavioral data stream of user is collected by Flume;The Kafka data that record is collected from the Flume in real time;According to the different service types of real-time behavioral data stream, the real time data of user behavior generation is handled in real time with real-time Computational frame Spark, to generate the real time data table of user;With the real time data table and historical behavior tables of data of the IMSI number association user in the IMSI storehouses, the wide table of behavioral data of user is obtained;The wide table of the behavioral data of the user is exported and is saved in HBase databases according to preset configuration file;By inquiry system Impala and HBase database integrations, to provide the inquiry entrance of user behavior data to outside.The technical scheme that the present invention is provided, can set up user behavior data business system that is efficient, becoming more meticulous.

Description

A kind of user behavior data processing method based on Hadoop
Technical field
The invention belongs to communication technical field, it is related to a kind of user behavior data processing method based on Hadoop.
Background technology
Commercialization and widespread deployment with 4G networks, when mobile communication business formally enters mobile Internet comprehensively Generation, the mobile network's bandwidth developed rapidly directly brings numerous and diverse application and user behavior, and the data in communication network are complicated Degree, information content are all increased rapidly therewith, and the complexity and operand requirement for causing data processing all have higher requirement therewith, The data-handling capacity of traditional database system receives great challenge.And in face of mass data processing demand and it is lower when Ductility limitation require, traditional data system input CPU computing capabilitys, internal memory response and handle up, the network bandwidth suffer from it is huge Benchmark, and face under high security, polycentric development trend many bottlenecks.The arrival in big data epoch makes single node Computation schema can not meet the demand of data processing, distributed data processing is progressively flat as big data with storage system The preferred framework of platform, big data technology becomes the focus of many mutually researchs.And Hadoop big data platforms are based primarily upon static number According to the parallel processing of file, although handle up in mass data, calculate, having high efficiency in terms of storage, but real-time compared with Difference, belongs to height and handles up, high concurrent, the framework of high time delay, and the process performance for small documents is always its unavoidable problem, Therefore for helpless under the higher data processing of some real-times and usage scenario.
It is special there is presently no a kind of method handled for Internet user's real time data and history (offline) Data Integration It is not the lean operation method that can adapt to operator's big data development.
The content of the invention
In view of this, it is an object of the invention to provide a kind of user behavior data processing method based on Hadoop, energy It is enough to set up user behavior data business system that is efficient, becoming more meticulous.
To reach above-mentioned purpose, the present invention provides following technical scheme:
A kind of user behavior data processing method based on Hadoop, methods described includes:
User's history data source is imported into distributed file system HDFS, to provide data access by the HDFS Interface;Wherein, the user's history data source includes international mobile subscriber identity IMSI storehouses, International Mobile Equipment Identity code At least one in IMEI storehouses and reptile storehouse;
The historical behavior tables of data of user is generated based on the user's history data source;
The real-time behavioral data stream of user is collected by metadata acquisition tool Flume, the real-time behavioral data stream includes The real-time internet log of user and user internet behavior real time parsing data;
The distributed ordering system Kafka data that record is collected from the Flume in real time, and be as message format component Real-time Computational frame provides data;
According to the different service types of real-time behavioral data stream, user's row is handled in real time with real-time Computational frame Spark For the real time data of generation, to generate the real time data table of user;
With the real time data table and historical behavior tables of data of the IMSI number association user in the IMSI storehouses, user is obtained The wide table of behavioral data;
The wide table of the behavioral data of the user is exported and is saved in HBase databases according to preset configuration file;
By inquiry system Impala and HBase database integrations, to provide the inquiry entrance of user behavior data to outside.
Further, the historical behavior tables of data for generating user based on the user's history data source includes:
All historical behavior data of the user are associated by the IMSI number in the IMSI storehouses, and by the user's All historical behavior data are mapped in Tool for Data Warehouse Hive, to form the historical behavior tables of data of the user.
Further, it is described after the distributed ordering system Kafka data that record is collected from the Flume in real time Method also includes:
Judge whether pending data have been buffered in Kafka configuration files;If so, by the pending data Send to the real-time Computational frame Spark;If it is not, by the data feedback to processing to the distributed ordering system Kafka。
Further, the IMSI storehouses, IMEI storehouses and reptile storehouse imported into HDFS by Sqoop from relevant database In.
Further, the fact that the user behavioral data stream include user mobile terminal access characteristics, search Information and flow consume corresponding real time data.
Further, obtaining the wide table of behavioral data of user includes:
Based on different service logics, obtain the real time data table of all input users with Map/Reduce frameworks and go through The output valve of history behavioral data table, to form the wide table of the behavioral data;Wherein, an IMSI number characterizes a user.
Further, the structure of table is numbered including IMSI number with business in the HBase databases combination and be used for Deposit the row of the specific business information of user.
The beneficial effects of the present invention are:
(1) the magnanimity history initial data of user is stored on HDFS by the present invention, is provided for initial data and possesses Gao Rong Wrong, height is handled up, the memory space of low cost, supports to access the data in file system in the form of streaming;Pass through data acquisition work Has the real time data that Flume collects user behavior, real time data is real-time including the real-time internet log of user, the behavior of user internet Data, the Kafka data that record is collected from Flume in real time are parsed, and are the real-time Computational frame in upper strata as message format component Authentic data support is provided, the real time data of user behavior generation is then handled in real time with Spark internal memories Computational frame.Pass through The real time data and historical data of IMSI number association user, obtain the wide table of unified user behavior data, and be stored in distribution In database HBase, a feasible solution is provided for the storage of mass users behavioral data, conventional method is alleviated Middle unit stores the pressure of customer data.
(2) present invention is based on Hadoop platform, will set up the user behavior system task that becomes more meticulous and is distributed to by low configuration In the cluster environment of computer composition, integrated with Impala and HBase and the efficient query engine of user behavior data is provided, reduced Query time postpones, and the execution speed than primary MapReduce and Hive is many soon.
(3) user behavior data generation method of the present invention, for the single data of legacy user, the party Method establish efficiently, the user behavior data business system that becomes more meticulous, be provided simultaneously with high scalability, effectively lifting operator is fine Change operation ability.
Brief description of the drawings
In order that the purpose of the present invention, technical scheme and beneficial effect are clearer, the present invention provides drawings described below and carried out Explanation:
A kind of flow chart for user behavior data generation method based on Hadoop that Fig. 1 provides for the present invention;
Fig. 2 is the design diagram of user's history behavioral data table in the present invention;
Fig. 3 is the modelling schematic diagram of the real-time behavioral data of user in the present invention;
Fig. 4 is the design diagram of the wide table of user behavior data in the present invention;
Fig. 5 is HBase storage organization figures in the present invention.
Embodiment
Below in conjunction with accompanying drawing, the preferred embodiments of the present invention are described in detail.
Referring to Fig. 1, the application embodiment provides a kind of user behavior data processing method based on Hadoop, it is described Method includes:
User's history data source is imported into distributed file system HDFS, to provide data access by the HDFS Interface;Wherein, the user's history data source includes international mobile subscriber identity IMSI storehouses, International Mobile Equipment Identity code At least one in IMEI storehouses and reptile storehouse;
The historical behavior tables of data of user is generated based on the user's history data source;
The real-time behavioral data stream of user is collected by metadata acquisition tool Flume, the real-time behavioral data stream includes The real-time internet log of user and user internet behavior real time parsing data;
The distributed ordering system Kafka data that record is collected from the Flume in real time, and be as message format component Real-time Computational frame provides data;
According to the different service types of real-time behavioral data stream, user's row is handled in real time with real-time Computational frame Spark For the real time data of generation, to generate the real time data table of user;
With the real time data table and historical behavior tables of data of the IMSI number association user in the IMSI storehouses, user is obtained The wide table of behavioral data;
The wide table of the behavioral data of the user is exported and is saved in HBase databases according to preset configuration file;
By inquiry system Impala and HBase database integrations, to provide the inquiry entrance of user behavior data to outside.
In the present embodiment, the historical behavior tables of data for generating user based on the user's history data source includes:
All historical behavior data of the user are associated by the IMSI number in the IMSI storehouses, and by the user's All historical behavior data are mapped in Tool for Data Warehouse Hive, to form the historical behavior tables of data of the user.
In the present embodiment, distributed ordering system Kafka in real time record from the Flume collect data it Afterwards, methods described also includes:
Judge whether pending data have been buffered in Kafka configuration files;If so, by the pending data Send to the real-time Computational frame Spark;If it is not, by the data feedback to processing to the distributed ordering system Kafka。
In the present embodiment, the IMSI storehouses, IMEI storehouses and reptile storehouse are imported by Sqoop from relevant database Into HDFS.
In the present embodiment, behavioral data stream includes access spy of the user in mobile terminal to the fact that the user Property, search information and flow consume corresponding real time data.
In the present embodiment, obtaining the wide table of behavioral data of user includes:
Based on different service logics, obtain the real time data table of all input users with Map/Reduce frameworks and go through The output valve of history behavioral data table, to form the wide table of the behavioral data;Wherein, an IMSI number characterizes a user.
In the present embodiment, in the HBase databases structure of table include combination that IMSI number and business number with And for depositing the row of the specific business information of user.
In the present embodiment, Hadoop is an open source projects of Apache organization and administration, has been obtained at present substantial amounts of Using, Hadoop has been grown into including Hadoop common, HDFS, MapReduce, ZooKeeper, Avro, Chukwa, 10 sub-projects including HBase, Hive, Mahout, Pig, Hadoop core is by Hadoop Common, HDFS (Hadoop Distributed File System) and Map Reduce three subsystems are constituted.Wherein Hadoop Common parts provide the foundation support sexual function for the overall frameworks of Hadoop, mainly include file system, remote process Invocation protocol and data serializing storehouse;HDFS is distributed file system, with high fault tolerance and use cost than relatively low spy Point;Map Reduce are mainly used in writing the parallelisation procedure for quickly handling mass data on large-scale computer cluster It is a programming model and software frame.
Spark is a distributed internal memory Computational frame, is characterized in that large-scale data can be handled, calculating speed is fast. The integrated Hadoop of Spark needs distributed file system could be operated, and the MapReduce that it has continued Hadoop calculates mould Type, by contrast Spark calculating process be maintained in internal memory, reduce disk read-write, can by it is multiple operation merge After calculate, therefore improve calculating speed.Spark must be ridden in hadoop cluster, and its data source is HDFS, substantially It is a Computational frame on Yarn, as MapReduce.Spark cores are divided into RDD.Spark SQL、Spark The core components such as Streaming, MLlib, GraphX, SparkR solve the problems, such as many big datas, its perfect framework day It is welcome.Its corresponding ecological environment in terms of visualization, just grows stronger day by day including zepplin etc..Spark read and write process unlike Hadoop overflows write-in disk, is all based on internal memory, therefore speed is quickly.The width of other DAG job scheduling systems, which is relied on, to be allowed Spark speed is improved.
Sqoop is the instrument of the efficient transfer data between relevant database and HDFS, can be by a relational data Data in storehouse are imported into Hadoop HDFS, can also be led HDFS data into relevant database.
Flume is the High Availabitity that Cloudera is provided, highly reliable, distributed massive logs collection, polymerization With the system of transmission, Flume supports to customize Various types of data sender in log system, for collecting data;Meanwhile, Flume There is provided and simple process is carried out to data, and write the ability of various data receivings (customizable).
Kafka is that a kind of distributed post of high-throughput subscribes to message system, and it can handle the net of consumer's scale Everything flow data in standing.Kafka can record the data collected from metadata acquisition tool Flume in real time, and conduct disappears Breath Buffer Unit provides authentic data support for the real-time Computational frame in upstream.
HBase is a high reliability, high-performance, towards row, telescopic distributed memory system, utilizes HBase skills Art can erect large-scale structure storage cluster on cheap PC Server.HBase is different from general relational database, It is a database for being suitable for unstructured data storage.
Impala is by the big data real-time query analysis tool of the leading exploitation of Cloudera companies, than being based on originally MapReduce HiveSQL inquiry velocities lift 3~90 times, and more flexibly easy-to-use.Class SQL query statement is provided, can Inquiry is stored in the PB level big datas in Hadoop HDFS and HBase.Inquiry velocity is its maximum advantage soon.Impala makees For big data real-time query analysis tool, fast with inquiry velocity, flexibility is high, easily integrates, the features such as scalability is strong.
Currently used APP durations on the day of with user's history behavioral data (area attribute, user handle set meal) and user Exemplified by, the technical scheme that the present invention is provided comprises the following steps:
Step 1:User's history data source is imported into distributed file system HDFS, the number of high-throughput is provided by HDFS According to access ability, wherein data source includes IMSI storehouses, IMEI storehouses, reptile storehouse;IMSI storehouses, IMEI storehouses, reptile storehouse by Sqoop from Relevant database imported into HDFS, and the sheet format of user's history data source is as shown in Figure 2.
Step 2:The real time data of user behavior is collected by metadata acquisition tool Flume, real time data is with the day of user Exemplified by currently used App durations, the Kafka data that record is collected from Flume in real time, and be that upper strata is real as message format component When Computational frame provide authentic data support.
Step 3:Illustrated according to the example of step 2, handle user App's by Spark real time data processings instrument Using duration, so that each App's used on the day of calculating active user uses duration and exports in real time, similarly, when meter One day enough is calculated, one week, in January, data can be exported, real time data structure is as shown in Figure 3.
Step 4:According to different service logics, when service logic in this example handles set meal and App use for user It is long, the output valve of all input users (IMSI represents a user) is obtained with Map/Reduce frameworks, user is formed Behavior table, sheet format is as shown in Figure 4.
Step 5:According to configuration file, user behavior data is saved in HBase, Impala and HBase is integrated and provides The inquiry entrance of user behavior data, compared to primary MapReduce and Hive execution speed, will be significantly increased use The statistical analysis speed of the wide table of family behavioral data.Storage organization in HBase as shown in figure 5, RowKey be IMSI+ business numbering, There is a row Data in row cluster:Label, deposits the specific business information of user.
The beneficial effects of the present invention are:
(1) the magnanimity history initial data of user is stored on HDFS by the present invention, is provided for initial data and possesses Gao Rong Wrong, height is handled up, the memory space of low cost, supports to access the data in file system in the form of streaming;Pass through data acquisition work Has the real time data that Flume collects user behavior, real time data is real-time including the real-time internet log of user, the behavior of user internet Data, the Kafka data that record is collected from Flume in real time are parsed, and are the real-time Computational frame in upper strata as message format component Authentic data support is provided, the real time data of user behavior generation is then handled in real time with Spark internal memories Computational frame.Pass through The real time data and historical data of IMSI number association user, obtain the wide table of unified user behavior data, and be stored in distribution In database HBase, a feasible solution is provided for the storage of mass users behavioral data, conventional method is alleviated Middle unit stores the pressure of customer data.
(2) present invention is based on Hadoop platform, will set up the user behavior system task that becomes more meticulous and is distributed to by low configuration In the cluster environment of computer composition, integrated with Impala and HBase and the efficient query engine of user behavior data is provided, reduced Query time postpones, and the execution speed than primary MapReduce and Hive is many soon.
(3) user behavior data generation method of the present invention, for the single data of legacy user, the party Method establish efficiently, the user behavior data business system that becomes more meticulous, be provided simultaneously with high scalability, effectively lifting operator is fine Change operation ability.
Finally illustrate, preferred embodiment above is merely illustrative of the technical solution of the present invention and unrestricted, although logical Cross above preferred embodiment the present invention is described in detail, it is to be understood by those skilled in the art that can be Various changes are made to it in form and in details, without departing from claims of the present invention limited range.

Claims (7)

1. a kind of user behavior data processing method based on Hadoop, it is characterised in that methods described includes:
User's history data source is imported into distributed file system HDFS, connect with providing data access by the HDFS Mouthful;Wherein, the user's history data source includes international mobile subscriber identity IMSI storehouses, International Mobile Equipment Identity code IMEI At least one in storehouse and reptile storehouse;
The historical behavior tables of data of user is generated based on the user's history data source;
The real-time behavioral data stream of user is collected by metadata acquisition tool Flume, the real-time behavioral data stream includes user Real-time internet log and user internet behavior real time parsing data;
The distributed ordering system Kafka data that record is collected from the Flume in real time, and be real-time as message format component Computational frame provides data;
According to the different service types of real-time behavioral data stream, user behavior production is handled in real time with real-time Computational frame Spark Raw real time data, to generate the real time data table of user;
With the real time data table and historical behavior tables of data of the IMSI number association user in the IMSI storehouses, the row of user is obtained For the wide table of data;
The wide table of the behavioral data of the user is exported and is saved in HBase databases according to preset configuration file;
By inquiry system Impala and HBase database integrations, to provide the inquiry entrance of user behavior data to outside.
2. according to the method described in claim 1, it is characterised in that the history of user is generated based on the user's history data source Behavioral data table includes:
All historical behavior data of the user, and owning the user are associated by the IMSI number in the IMSI storehouses Historical behavior data are mapped in Tool for Data Warehouse Hive, to form the historical behavior tables of data of the user.
3. according to the method described in claim 1, it is characterised in that recorded in real time from described in distributed ordering system Kafka After the data that Flume is collected, methods described also includes:
Judge whether pending data have been buffered in Kafka configuration files;If so, the pending data are sent To the real-time Computational frame Spark;If it is not, by the pending data feedback to the distributed ordering system Kafka.
4. according to the method described in claim 1, it is characterised in that the IMSI storehouses, IMEI storehouses and reptile storehouse pass through Sqoop From relevant database imported into HDFS.
5. according to the method described in claim 1, it is characterised in that behavioral data stream includes user and existed by the fact that the user Access characteristics, search information and the flow of mobile terminal consume corresponding real time data.
6. according to the method described in claim 1, it is characterised in that obtaining the wide table of behavioral data of user includes:
Based on different service logics, the real time data table and history row of all input users is obtained with Map/Reduce frameworks For the output valve of tables of data, to form the wide table of the behavioral data;Wherein, an IMSI number characterizes a user.
7. according to the method described in claim 1, it is characterised in that the structure of table includes IMSI number in the HBase databases The combination numbered with business and the row for depositing the specific business information of user.
CN201710191813.7A 2017-03-27 2017-03-27 A kind of user behavior data processing method based on Hadoop Pending CN106951552A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710191813.7A CN106951552A (en) 2017-03-27 2017-03-27 A kind of user behavior data processing method based on Hadoop

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710191813.7A CN106951552A (en) 2017-03-27 2017-03-27 A kind of user behavior data processing method based on Hadoop

Publications (1)

Publication Number Publication Date
CN106951552A true CN106951552A (en) 2017-07-14

Family

ID=59474151

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710191813.7A Pending CN106951552A (en) 2017-03-27 2017-03-27 A kind of user behavior data processing method based on Hadoop

Country Status (1)

Country Link
CN (1) CN106951552A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748800A (en) * 2017-11-15 2018-03-02 北京易讯通信息技术股份有限公司 A kind of fusion of distributed real-time data processing government affairs service data and sharing method
CN108133041A (en) * 2018-01-11 2018-06-08 四川九洲电器集团有限责任公司 Data collecting system and method based on web crawlers and data transfer technology
CN108153836A (en) * 2017-12-14 2018-06-12 浙江航天恒嘉数据科技有限公司 A kind of time series data accesses system and method
CN109388637A (en) * 2018-09-21 2019-02-26 北京京东金融科技控股有限公司 Data warehouse information processing method, device, system, medium
CN110162563A (en) * 2019-05-28 2019-08-23 深圳市网心科技有限公司 A kind of data storage method, system and electronic equipment and storage medium
CN110209422A (en) * 2018-05-09 2019-09-06 腾讯科技(深圳)有限公司 A kind of method for processing business, computer equipment and client
CN111694891A (en) * 2019-03-12 2020-09-22 马上消费金融股份有限公司 Data table processing method and device
CN112416982A (en) * 2021-01-25 2021-02-26 北京轻松筹信息技术有限公司 Method and device for calculating real-time user characteristics
CN113177049A (en) * 2021-05-13 2021-07-27 中移智行网络科技有限公司 Data processing method, device and system
CN113434376A (en) * 2021-06-24 2021-09-24 山东浪潮科学研究院有限公司 Web log analysis method and device based on NoSQL
CN114490525A (en) * 2022-02-22 2022-05-13 北京科杰科技有限公司 System and method for analyzing and putting out and putting in storage of super-large unstructured text files remotely based on hadoop
CN115801353A (en) * 2022-11-03 2023-03-14 智网安云(武汉)信息技术有限公司 Linkage script processing method after real-time aggregation of safety event logs based on big data level

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761309A (en) * 2014-01-23 2014-04-30 中国移动(深圳)有限公司 Operation data processing method and system
CN105893628A (en) * 2016-05-17 2016-08-24 中国农业银行股份有限公司 Real-time data collection system and method
CN105930446A (en) * 2016-04-20 2016-09-07 重庆重邮汇测通信技术有限公司 Telecommunication customer tag generation method based on Hadoop distributed technology
US20170032384A1 (en) * 2015-07-29 2017-02-02 Geofeedia, Inc. System and Method for Analyzing Social Media Users Based on User Content Posted from Monitored Locations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761309A (en) * 2014-01-23 2014-04-30 中国移动(深圳)有限公司 Operation data processing method and system
US20170032384A1 (en) * 2015-07-29 2017-02-02 Geofeedia, Inc. System and Method for Analyzing Social Media Users Based on User Content Posted from Monitored Locations
CN105930446A (en) * 2016-04-20 2016-09-07 重庆重邮汇测通信技术有限公司 Telecommunication customer tag generation method based on Hadoop distributed technology
CN105893628A (en) * 2016-05-17 2016-08-24 中国农业银行股份有限公司 Real-time data collection system and method

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748800A (en) * 2017-11-15 2018-03-02 北京易讯通信息技术股份有限公司 A kind of fusion of distributed real-time data processing government affairs service data and sharing method
CN108153836A (en) * 2017-12-14 2018-06-12 浙江航天恒嘉数据科技有限公司 A kind of time series data accesses system and method
CN108133041A (en) * 2018-01-11 2018-06-08 四川九洲电器集团有限责任公司 Data collecting system and method based on web crawlers and data transfer technology
CN110209422B (en) * 2018-05-09 2021-08-27 腾讯科技(深圳)有限公司 Service processing method, computer equipment and client
CN110209422A (en) * 2018-05-09 2019-09-06 腾讯科技(深圳)有限公司 A kind of method for processing business, computer equipment and client
CN109388637A (en) * 2018-09-21 2019-02-26 北京京东金融科技控股有限公司 Data warehouse information processing method, device, system, medium
CN109388637B (en) * 2018-09-21 2020-09-01 京东数字科技控股有限公司 Data warehouse information processing method, device, system and medium
CN111694891A (en) * 2019-03-12 2020-09-22 马上消费金融股份有限公司 Data table processing method and device
CN111694891B (en) * 2019-03-12 2021-01-12 马上消费金融股份有限公司 Data table processing method and device
CN110162563A (en) * 2019-05-28 2019-08-23 深圳市网心科技有限公司 A kind of data storage method, system and electronic equipment and storage medium
CN110162563B (en) * 2019-05-28 2023-11-17 深圳市网心科技有限公司 Data warehousing method and system, electronic equipment and storage medium
CN112416982A (en) * 2021-01-25 2021-02-26 北京轻松筹信息技术有限公司 Method and device for calculating real-time user characteristics
CN112416982B (en) * 2021-01-25 2021-09-21 北京轻松筹信息技术有限公司 Method and device for calculating real-time user characteristics
CN113177049A (en) * 2021-05-13 2021-07-27 中移智行网络科技有限公司 Data processing method, device and system
CN113434376A (en) * 2021-06-24 2021-09-24 山东浪潮科学研究院有限公司 Web log analysis method and device based on NoSQL
CN113434376B (en) * 2021-06-24 2023-04-11 山东浪潮科学研究院有限公司 Web log analysis method and device based on NoSQL
CN114490525A (en) * 2022-02-22 2022-05-13 北京科杰科技有限公司 System and method for analyzing and putting out and putting in storage of super-large unstructured text files remotely based on hadoop
CN114490525B (en) * 2022-02-22 2022-08-02 北京科杰科技有限公司 System and method for analyzing and warehousing of ultra-large unstructured text files based on hadoop remote
CN115801353A (en) * 2022-11-03 2023-03-14 智网安云(武汉)信息技术有限公司 Linkage script processing method after real-time aggregation of safety event logs based on big data level

Similar Documents

Publication Publication Date Title
CN106951552A (en) A kind of user behavior data processing method based on Hadoop
CN103491187B (en) A kind of big data united analysis processing method based on cloud computing
US8260826B2 (en) Data processing system and method
CN105930446B (en) A kind of telecom client label generating method based on Hadoop distributed computing technology
CN105989129B (en) Real time data statistical method and device
CN103246749B (en) The matrix database system and its querying method that Based on Distributed calculates
CN109272155A (en) A kind of corporate behavior analysis system based on big data
CN107577805A (en) A kind of business service system towards the analysis of daily record big data
CN106815338A (en) A kind of real-time storage of big data, treatment and inquiry system
CN107038162A (en) Real time data querying method and system based on database journal
CN107704545A (en) Railway distribution net magnanimity information method for stream processing based on Storm Yu Kafka message communicatings
CN105512167A (en) Multi-business user data managing system based on mixed database and method for same
CN107895046B (en) Heterogeneous data integration platform
CN106126641A (en) A kind of real-time recommendation system and method based on Spark
CN104820670A (en) Method for acquiring and storing big data of power information
CN108021809A (en) A kind of data processing method and system
CN103390038A (en) HBase-based incremental index creation and retrieval method
CN107247799A (en) Data processing method, system and its modeling method of compatible a variety of big data storages
CN110688399A (en) Stream type calculation real-time report system and method
CN106850258A (en) A kind of Log Administration System, method and device
CN107103064A (en) Data statistical approach and device
CN111221791A (en) Method for importing multi-source heterogeneous data into data lake
CN107067322A (en) A kind of system and method applied to P2P network loan business data access models
CN103646051A (en) Big-data parallel processing system and method based on column storage
CN107025298A (en) A kind of big data calculates processing system and method in real time

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170714