CN106873945A - Data processing architecture and data processing method based on batch processing and Stream Processing - Google Patents

Data processing architecture and data processing method based on batch processing and Stream Processing Download PDF

Info

Publication number
CN106873945A
CN106873945A CN201611245710.6A CN201611245710A CN106873945A CN 106873945 A CN106873945 A CN 106873945A CN 201611245710 A CN201611245710 A CN 201611245710A CN 106873945 A CN106873945 A CN 106873945A
Authority
CN
China
Prior art keywords
data
processing
batch
module
view
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611245710.6A
Other languages
Chinese (zh)
Inventor
吴贺俊
冯辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201611245710.6A priority Critical patent/CN106873945A/en
Publication of CN106873945A publication Critical patent/CN106873945A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present invention relates to a kind of data processing architecture based on batch processing and Stream Processing, including:Data acquisition module, obtains the real time data of collection from multiple data collection stations, and the data transfer that will be gathered is to batch processing module and Stream Processing module;Batch processing module, the real time data to receiving carries out persistence treatment, and batch processing is carried out to the real time data processed through persistence using the mechanism of re-computation, and the result according to treatment generates varigrained batch view;Stream Processing module, carries out Stream Processing, and generate varigrained Stream Processing view according to the result for the treatment of for the real time data to receiving using the mechanism of incremental computations;Data combiners block, is merged using corresponding consolidation strategy to batch view, Stream Processing view;Data visualization module, is shown to the batch view after batch view, Stream Processing view or merging, Stream Processing view;Monitoring resource module, for carrying out monitoring resource with upper module.

Description

Data processing architecture and data processing method based on batch processing and Stream Processing
Technical field
The present invention relates to technical field of data processing, more particularly, to a kind of number based on batch processing and Stream Processing According to processing framework and data processing method.
Background technology
With widely using for the equipment such as the popularization of internet, the fast development of Internet of Things and smart mobile phone so that people Can whenever and wherever possible produce data, cause the explosive growth of data.For large-scale data, it has been proposed that distributed Batch processing model and Stream Processing model.
Wherein, the height of the extensive historical data of batch processing model realization is handled up, magnanimity analysis and is excavated, after it is first stored Calculate, it is often not high suitable for requirement of real-time, while the accuracy of data and comprehensive even more important scene, batch processing mould Type is widely used in the fields such as off-line analysis, offline machine learning.And Stream Processing model is more concerned with streaming data Real-time analysis, data reach in a streaming manner, carry bulk information, and the stream data of only fraction is stored in limited Internal memory in.Stream Processing model is widely used in the field of the low delays such as online recommendation, on-line analysis, online machine learning Jing Zhong.
However, the data processing mode of batch processing model and Stream Processing model is single, usage scenario is limited, they are all For the solution that single problem and scene are proposed, versatility is not had between the two.Batch processing model can be processed More comprehensive data and then more accurately result is obtained, but time delay is than larger.Carry out Stream Processing model energy low delay Calculate, but only cached in internal memory causes computational accuracy than relatively low than relatively limited data.And with the development of science and technology modern enterprise Industry has increasing need for a kind of method of low delay and processes historical data and real time data simultaneously.Both can guarantee that to whole data set Overall treatment, can guarantee that the efficiency for the treatment of again.
The content of the invention
The present invention is the problem of solution above technology, there is provided a kind of data processing frame based on batch processing and Stream Processing Structure, the framework possesses the ability of batch processing and Stream Processing, thus can while ensureing to carry out overall treatment to data set Take into account the efficiency for the treatment of.
To realize above goal of the invention, the technical scheme of use is:
A kind of data processing architecture based on batch processing and Stream Processing, including at data acquisition module, batch processing module, streaming Reason module, data combiners block, data visualization module and monitoring resource module;
Wherein data acquisition module is used to be obtained from multiple data collection stations the real time data of collection, and the data that will be gathered Transmit to batch processing module and Stream Processing module;
The batch processing module is used to carry out persistence treatment to the real time data for receiving, and is then meeting execution batch processing condition In the case of, batch processing is carried out to the real time data processed through persistence using the mechanism of re-computation, and according to the knot for the treatment of Fruit generates varigrained batch view;
The Stream Processing module is used to carry out Stream Processing using the mechanism of incremental computations to the real time data for receiving, and according to The result for the treatment of generates varigrained Stream Processing view;
The data combiners block is used for according to specific query demand, using corresponding consolidation strategy to batch view, streaming Treatment view is merged;
The data visualization module is used for batch view, the streaming after batch view, Stream Processing view or merging Treatment view is shown;
The monitoring resource module be used for data acquisition module, batch processing module, Stream Processing module, data combiners block, Data visualization module carries out monitoring resource.
Preferably, the data acquisition module includes Data Collection submodule and data cleansing submodule, and the data are received Collection submodule is used to receive the real time data for obtaining collection from multiple data collection stations, and the data cleansing submodule is used for The real time data for receiving is cleaned using corresponding filtering rule.
Preferably, the batch processing module includes data prediction submodule, data processing submodule and batch view Sub-module stored;
The data prediction submodule is used to use Data Integration, data converter technique, number to the real time data for receiving Persistence treatment is carried out according to stipulations technology;
The data processing submodule meet perform batch processing condition in the case of, using the mechanism of re-computation to through persistence The real time data for the treatment of carries out batch processing;
The batch view sub-module stored is used to be stored in the result that data processing submodule is obtained in Hbase, To generate varigrained batch view.
Preferably, the Stream Processing module includes data processing submodule, Stream Processing view sub-module stored, wherein The data processing submodule is used to carry out real time data Stream Processing using the mechanism of incremental computations, and the Stream Processing is regarded Figure sub-module stored is used to be stored in Hbase the data processed result that data processing submodule is produced, to generate different grains The Stream Processing view of degree.
Preferably, the data acquisition module is realized using Flume Log Collect Systems.
Preferably, the batch processing module is realized using Spark clusters.
Preferably, the Stream Processing module is realized using Storm clusters.
Meanwhile, present invention also offers a kind of data processing method based on above framework, its scheme specifically includes following Step:
S1. data acquisition module is used to be obtained from multiple data collection stations the real time data of collection, and the data that will be gathered Transmit to batch processing module and Stream Processing module;
S2. batch processing module carries out persistence treatment to the real time data for receiving, and is then meeting the feelings of execution batch processing condition Under condition, batch processing is carried out to the real time data processed through persistence using the mechanism of re-computation, and according to the result life for the treatment of Into varigrained batch view;
S3. Stream Processing module carries out Stream Processing to the real time data for receiving using the mechanism of incremental computations, and according to treatment Result generate varigrained Stream Processing view;
S4. data combiners block is regarded using corresponding consolidation strategy according to specific query demand to batch view, Stream Processing Figure is merged;
S5. data visualization module is to batch view, the Stream Processing after batch view, Stream Processing view or merging View is shown;
S6. monitoring resource module merges to data acquisition module, batch processing module, Stream Processing module, data in above flow Module, data visualization module carry out monitoring resource.
Compared with prior art, the beneficial effects of the invention are as follows:
The framework that the present invention is provided is arranged in pairs or groups and is used by by batch processing module, Stream Processing module, it is ensured that whole to calculate knot The precision of fruit, while taking into account data-handling efficiency.
Brief description of the drawings
The structure chart of the framework that Fig. 1 is provided for the present invention.
Fig. 2 is the schematic diagram of data collection module.
Fig. 3 performs figure for the calculating task of Spark clusters.
Fig. 4 is the flow chart of incremental computations in Stream Processing module.
Fig. 5, Fig. 6, Fig. 7 are batch processing module and the synchronous schematic diagram of Stream Processing module data.
Fig. 8 is the schematic flow sheet that data combiners block performs data processing.
Specific embodiment
Accompanying drawing being for illustration only property explanation, it is impossible to be interpreted as the limitation to this patent;
Below in conjunction with drawings and Examples, the present invention is further elaborated.
Embodiment 1
Batch processing and the data processing architecture of Stream Processing, as shown in figure 1, including data acquisition module 10, batch processing module 20, Stream Processing module 30, data combiners block 40, data visualization module 50 and monitoring resource module 60;
Wherein data acquisition module is used to be obtained from multiple data collection stations the real time data of collection, and the data that will be gathered Transmit to batch processing module and Stream Processing module;
The batch processing module is used to carry out persistence treatment to the real time data for receiving, and is then meeting execution batch processing condition In the case of, batch processing is carried out to the real time data processed through persistence using the mechanism of re-computation, and according to the knot for the treatment of Fruit generates varigrained batch view;
The Stream Processing module is used to carry out Stream Processing using the mechanism of incremental computations to the real time data for receiving, and according to The result for the treatment of generates varigrained Stream Processing view;
The data combiners block is used for according to specific query demand, using corresponding consolidation strategy to batch view, streaming Treatment view is merged;
The data visualization module is used for batch view, the streaming after batch view, Stream Processing view or merging Treatment view is shown;
The monitoring resource module be used for data acquisition module, batch processing module, Stream Processing module, data combiners block, Data visualization module carries out monitoring resource.
In specific implementation process, the specific embodiment of data acquisition module 10 can be:Using distributed, Gao Ke Real-time reception, such as Flume Log Collect Systems are carried out to multi-source data by the massive logs collection with High Availabitity and Transmission system. As shown in Fig. 2 being provided with three agencies, respectively Agent1, Agent2 and Master Agent in the framework.Flume daily records Acquisition system receives external data using two Source, and one is the Avro Source in Agent1, for monitoring an IP And port numbers, another is the Spooldir in Agent2, for monitoring a catalogue.Enter by the real time data for collecting After the preliminary data filtering of row, the Avro during the data received from two Source are issued Master Agent Source.The framework using replication strategy the data received in Avro Source and meanwhile be sent to File Channel and In Memory Channel, then data are eventually communicated in HDFS Sink and Kafka Sink, for batch processing and stream Formula treatment.
As shown in figure 3, batch processing module 20 is realized using Spark clusters, building Spark first during realization should With the running environment of program, then application program is submitted on Resource Scheduler, the resource needed for the application can be disposable It is ready to, now belongs to coarseness constructing environment.Then application program is converted into DAG figures, Spark turns RDD dependences Turn to different stage.Here dependence is divided into narrow dependence and dependence wide, and each subregion of father RDD can only be by one in narrow dependence Individual sub- RDD multidomain treat-ments, and the father RDD that relies on wide can give many sub- RDD subregions.Spark tried one's best by greedy algorithm make it is narrow Rely on and divide in a single stage, and the parallel processing multiple tasks in each stage.When performing DAG figures, it is first carried out disobeying Rely the stage in other stages, rerun the dependence stage completed stage, it is the same with the Optimization Mechanism in MapReduce, Spark can consider data locality and speculate execution mechanism.The result of batch processing module is stored in Hbase, to generate Varigrained batch view, the result of batch processing module, batch view are stored in Hbase primarily to propping up Hold random read-write.
In specific implementation process, Stream Processing module 30 is realized using Storm clusters, and its concrete function is sketched such as Under:
In Storm clusters, a real-time application is designed to a Topology, and Topology is submitted into cluster, Code is distributed by the main controlled node in cluster, working node execution is assigned the task to.One Topology include spout and Two kinds of roles of bolt, wherein spout sends message, is responsible in the form of tuples sending data flow;And bolt is then responsible for Transmitting data flow, the operation such as can complete to calculate, filter in bolt, and bolt itself can also at random send the data to other bolt.The wherein result of Stream Processing module 30 and the view of generation is all stored in Hbase, when being reached so as to new data Operation can be updated with low delay.
Meanwhile, in order to improve the treatment effeciency of data, Stream Processing module 30 can use the mechanism of incremental computations, specifically Process is summarized as follows:As shown in figure 4, when Stream Processing module has new data to reach, can first determine whether whether the data can shadows Ring to data with existing;If new data has influence on data with existing, data with existing is taken out from Hbase, and and new data Merge;If new data does not interfere with data with existing, do not process;The result of above-mentioned steps is counted as new According to, take corresponding algorithm to calculate new data, then the corresponding RUNTIME VIEW of generation new data regards in real time by what is generated Figure is updated in existing Stream Processing view.
In specific implementation process, in order to ensure that the data for flowing into batch processing module and Stream Processing module are only processed Once, it is necessary to consider the data synchronization problems between batch processing module and Stream Processing module, its process is as follows:
The data that batch processing module and Stream Processing module are collected simultaneously, batch processing module saves the data in HDFS On, Stream Processing module is saved the data in table, and table name current date and the data content for receiving are identified, by dynamic dimension Two tables are protected to solve the problems, such as data syn-chronization.As shown in figure 5, system brings into operation after a period of time, batch processing module and stream Formula processing module preserves identical data, but batch processing module does not arrive the time point of triggering re-computation, namely batch processing The data of module are not calculated.Now, it is assumed that the table of Stream Processing module is i_click.
As shown in fig. 6, after the time point for having arrived batch processing module re-computation, the re-computation of batch processing module is triggered, batch Processing module can again build a table according to the current time in system before re-computation, for preserving real time data.Table name is i+ 1_click.Assuming that the data received during re-computation are block1 and block2, then now in Stream Processing module altogether Two tables are deposited, one is i_click, and one is i+1_click.What i_click was preserved is the real time data for receiving for i-th day, i What is preserved in+1_click is the i+1 days new real time datas for receiving, and that is to say block1 and block2.
As shown in fig. 7, being the result after system carries out data syn-chronization.Batch processing module can delete table after re-computation is carried out I_click, now Stream Processing module there was only the data in i+1_click tables.Because now the data in i_click exist Calculated in batch processing module, so Stream Processing module no longer calculates this partial data, will otherwise cause the re-computation of data.
In specific implementation process, the specific embodiment of data combiners block 40 can be:For the specific of user Business demand, merges the result of calculation of batch processing module and Stream Processing module, so as to realize the inquiry on whole data set. Therefore its key point is how to merge the batch view that batch processing module calculates and the reality that Stream Processing module is calculated When view, then according to specific service logic, select corresponding consolidation strategy.If query function meets Monoid characteristics, Meet Percentage bound, directly can merge batch view and Stream Processing view result.If as shown in figure 8, will Inquiry first determines whether the span of input time section in the click volume of different time sections article, if it is completely in batch processing mould Block, then need to only inquire about from batch view and obtain corresponding result;If its completely in Stream Processing module, only need to from Inquiry obtains corresponding result in Stream Processing view;If it needs across in batch processing module and Stream Processing module Inquired about from batch view and Stream Processing view respectively, then merge Query Result, namely to identical items Purchase volume is simply added.If query function is unsatisfactory for Monoid characteristics, query function can be converted to multiple full The query function of sufficient Monoid characteristics carries out computing, for single each query function respectively from batch view and streaming Query Result in reason view, then carries out correlation computations and obtains final required result again.
Obviously, the above embodiment of the present invention is only intended to clearly illustrate example of the present invention, and is not right The restriction of embodiments of the present invention.For those of ordinary skill in the field, may be used also on the basis of the above description To make other changes in different forms.There is no need and unable to be exhaustive to all of implementation method.It is all this Any modification, equivalent and improvement made within the spirit and principle of invention etc., should be included in the claims in the present invention Protection domain within.

Claims (8)

1. a kind of data processing architecture based on batch processing and Stream Processing, it is characterised in that:Including data acquisition module, batch at Reason module, Stream Processing module, data combiners block, data visualization module and monitoring resource module;
Wherein data acquisition module is used to be obtained from multiple data collection stations the real time data of collection, and the data that will be gathered Transmit to batch processing module and Stream Processing module;
The batch processing module is used to carry out persistence treatment to the real time data for receiving, and is then meeting execution batch processing condition In the case of, batch processing is carried out to the real time data processed through persistence using the mechanism of re-computation, and according to the knot for the treatment of Fruit generates varigrained batch view;
The Stream Processing module is used to carry out Stream Processing using the mechanism of incremental computations to the real time data for receiving, and according to The result for the treatment of generates varigrained Stream Processing view;
The data combiners block is used for according to specific query demand, using corresponding consolidation strategy to batch view, streaming Treatment view is merged;
The data visualization module is used for batch view, the streaming after batch view, Stream Processing view or merging Treatment view is shown;
The monitoring resource module be used for data acquisition module, batch processing module, Stream Processing module, data combiners block, Data visualization module carries out monitoring resource.
2. the data processing architecture based on batch processing and Stream Processing according to claim 1, it is characterised in that:The number Include Data Collection submodule and data cleansing submodule according to acquisition module, the Data Collection submodule is used to receive from multiple The real time data of collection is obtained in data collection station, the data cleansing submodule is used for using the docking of corresponding filtering rule The real time data of receipts is cleaned.
3. the data processing architecture based on batch processing and Stream Processing according to claim 1, it is characterised in that:Described batch Processing module includes data prediction submodule, data processing submodule and batch view sub-module stored;
The data prediction submodule is used to use Data Integration, data converter technique, number to the real time data for receiving Persistence treatment is carried out according to stipulations technology;
The data processing submodule meet perform batch processing condition in the case of, using the mechanism of re-computation to through persistence The real time data for the treatment of carries out batch processing;
The batch view sub-module stored is used to be stored in the result that data processing submodule is obtained in Hbase, To generate varigrained batch view.
4. the data processing architecture based on batch processing and Stream Processing according to claim 1, it is characterised in that:The stream Formula processing module includes data processing submodule, Stream Processing view sub-module stored, wherein the data processing submodule is used In Stream Processing is carried out to real time data using the mechanism of incremental computations, the Stream Processing view sub-module stored is used for logarithm The data processed result produced according to treatment submodule is stored in Hbase, to generate varigrained Stream Processing view.
5. the data processing architecture based on batch processing and Stream Processing according to claim 2, it is characterised in that:The number Realized using Flume Log Collect Systems according to acquisition module.
6. the data processing architecture based on batch processing and Stream Processing according to claim 3, it is characterised in that:Described batch Processing module is realized using Spark clusters.
7. the data processing architecture based on batch processing and Stream Processing according to claim 4, it is characterised in that:The stream Formula processing module is realized using Storm clusters.
8. the data processing method of one kind framework according to above any one of claim 1 ~ 7, it is characterised in that:Including following Step:
S1. data acquisition module is used to be obtained from multiple data collection stations the real time data of collection, and the data that will be gathered Transmit to batch processing module and Stream Processing module;
S2. batch processing module carries out persistence treatment to the real time data for receiving, and is then meeting the feelings of execution batch processing condition Under condition, batch processing is carried out to the real time data processed through persistence using the mechanism of re-computation, and according to the result life for the treatment of Into varigrained batch view;
S3. Stream Processing module carries out Stream Processing to the real time data for receiving using the mechanism of incremental computations, and according to treatment Result generate varigrained Stream Processing view;
S4. data combiners block is regarded using corresponding consolidation strategy according to specific query demand to batch view, Stream Processing Figure is merged;
S5. data visualization module is to batch view, the Stream Processing after batch view, Stream Processing view or merging View is shown;
S6. monitoring resource module merges to data acquisition module, batch processing module, Stream Processing module, data in above flow Module, data visualization module carry out monitoring resource.
CN201611245710.6A 2016-12-29 2016-12-29 Data processing architecture and data processing method based on batch processing and Stream Processing Pending CN106873945A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611245710.6A CN106873945A (en) 2016-12-29 2016-12-29 Data processing architecture and data processing method based on batch processing and Stream Processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611245710.6A CN106873945A (en) 2016-12-29 2016-12-29 Data processing architecture and data processing method based on batch processing and Stream Processing

Publications (1)

Publication Number Publication Date
CN106873945A true CN106873945A (en) 2017-06-20

Family

ID=59164023

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611245710.6A Pending CN106873945A (en) 2016-12-29 2016-12-29 Data processing architecture and data processing method based on batch processing and Stream Processing

Country Status (1)

Country Link
CN (1) CN106873945A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391719A (en) * 2017-07-31 2017-11-24 南京邮电大学 Distributed stream data processing method and system in a kind of cloud environment
CN107908797A (en) * 2017-12-18 2018-04-13 上海中畅数据技术有限公司 A kind of ETL data stream treatment technology method and systems in real time
CN108304454A (en) * 2017-11-27 2018-07-20 大象慧云信息技术有限公司 Invoice data real time aggregation device based on big data
CN108718345A (en) * 2018-09-05 2018-10-30 电子科技大学 A kind of digitlization workshop industrial data Network Transmitting system
CN109598348A (en) * 2017-09-28 2019-04-09 北京猎户星空科技有限公司 A kind of image pattern obtains, model training method and system
CN109684377A (en) * 2018-12-13 2019-04-26 深圳市思迪信息技术股份有限公司 General big data handles development platform and its data processing method in real time
CN111079924A (en) * 2018-10-19 2020-04-28 中科寒武纪科技股份有限公司 Operation method, system and related product
CN111211993A (en) * 2018-11-21 2020-05-29 百度在线网络技术(北京)有限公司 Incremental persistence method and device for streaming computation
CN112527839A (en) * 2020-12-10 2021-03-19 上海浦东发展银行股份有限公司 Multi-source data processing method, system, equipment and storage medium
CN112597200A (en) * 2020-12-22 2021-04-02 南京三眼精灵信息技术有限公司 Batch and streaming combined data processing method and device
CN114816704A (en) * 2022-04-25 2022-07-29 湖南大学 Spark task scheduling method and system based on heterogeneous resources
WO2023109806A1 (en) * 2021-12-14 2023-06-22 天翼物联科技有限公司 Method and apparatus for processing active data for internet of things device, and storage medium
CN116841753A (en) * 2023-08-31 2023-10-03 杭州迅杭科技有限公司 Stream processing and batch processing switching method and switching device
CN117787902A (en) * 2023-12-26 2024-03-29 航天神舟智慧系统技术有限公司 Flow batch integration-based distribution control early warning system and method
CN114816704B (en) * 2022-04-25 2024-10-15 湖南大学 Spark task scheduling method and system based on heterogeneous resources

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013097234A1 (en) * 2011-12-31 2013-07-04 华为技术有限公司 Service processing method and system
CN105677752A (en) * 2015-12-30 2016-06-15 深圳先进技术研究院 Streaming computing and batch computing combined processing system and method
CN105701161A (en) * 2015-12-31 2016-06-22 深圳先进技术研究院 Real-time big data user label system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013097234A1 (en) * 2011-12-31 2013-07-04 华为技术有限公司 Service processing method and system
CN105677752A (en) * 2015-12-30 2016-06-15 深圳先进技术研究院 Streaming computing and batch computing combined processing system and method
CN105701161A (en) * 2015-12-31 2016-06-22 深圳先进技术研究院 Real-time big data user label system

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391719A (en) * 2017-07-31 2017-11-24 南京邮电大学 Distributed stream data processing method and system in a kind of cloud environment
CN109598348A (en) * 2017-09-28 2019-04-09 北京猎户星空科技有限公司 A kind of image pattern obtains, model training method and system
CN108304454B (en) * 2017-11-27 2022-05-17 大象慧云信息技术有限公司 Invoice data real-time aggregation device based on big data
CN108304454A (en) * 2017-11-27 2018-07-20 大象慧云信息技术有限公司 Invoice data real time aggregation device based on big data
CN107908797A (en) * 2017-12-18 2018-04-13 上海中畅数据技术有限公司 A kind of ETL data stream treatment technology method and systems in real time
CN108718345A (en) * 2018-09-05 2018-10-30 电子科技大学 A kind of digitlization workshop industrial data Network Transmitting system
CN111079924A (en) * 2018-10-19 2020-04-28 中科寒武纪科技股份有限公司 Operation method, system and related product
CN111211993A (en) * 2018-11-21 2020-05-29 百度在线网络技术(北京)有限公司 Incremental persistence method and device for streaming computation
CN111211993B (en) * 2018-11-21 2023-08-11 百度在线网络技术(北京)有限公司 Incremental persistence method, device and storage medium for stream computation
CN109684377A (en) * 2018-12-13 2019-04-26 深圳市思迪信息技术股份有限公司 General big data handles development platform and its data processing method in real time
CN112527839A (en) * 2020-12-10 2021-03-19 上海浦东发展银行股份有限公司 Multi-source data processing method, system, equipment and storage medium
CN112597200A (en) * 2020-12-22 2021-04-02 南京三眼精灵信息技术有限公司 Batch and streaming combined data processing method and device
CN112597200B (en) * 2020-12-22 2024-01-12 南京三眼精灵信息技术有限公司 Batch and stream combined data processing method and device
WO2023109806A1 (en) * 2021-12-14 2023-06-22 天翼物联科技有限公司 Method and apparatus for processing active data for internet of things device, and storage medium
CN114816704A (en) * 2022-04-25 2022-07-29 湖南大学 Spark task scheduling method and system based on heterogeneous resources
CN114816704B (en) * 2022-04-25 2024-10-15 湖南大学 Spark task scheduling method and system based on heterogeneous resources
CN116841753A (en) * 2023-08-31 2023-10-03 杭州迅杭科技有限公司 Stream processing and batch processing switching method and switching device
CN116841753B (en) * 2023-08-31 2023-11-17 杭州迅杭科技有限公司 Stream processing and batch processing switching method and switching device
CN117787902A (en) * 2023-12-26 2024-03-29 航天神舟智慧系统技术有限公司 Flow batch integration-based distribution control early warning system and method

Similar Documents

Publication Publication Date Title
CN106873945A (en) Data processing architecture and data processing method based on batch processing and Stream Processing
CN110460656B (en) Industry environmental protection thing networking remote monitoring cloud platform
CN110022226B (en) Object-oriented data acquisition system and acquisition method
CN106778033B (en) A kind of Spark Streaming abnormal temperature data alarm method based on Spark platform
CN107679192A (en) More cluster synergistic data processing method, system, storage medium and equipment
Qiu et al. A packet buffer evaluation method exploiting queueing theory for wireless sensor networks
CN101902497B (en) Cloud computing based internet information monitoring system and method
Liu et al. Real-time complex event processing and analytics for smart grid
CN109739919A (en) A kind of front end processor and acquisition system for electric system
CN106599190A (en) Dynamic Skyline query method based on cloud computing
CN107086929A (en) A kind of batch streaming computing system performance guarantee method based on modeling of queuing up
CN104394149A (en) Complex event processing method based on parallel distributed architecture
CN107454009B (en) Data center-oriented offline scene low-bandwidth overhead traffic scheduling scheme
CN106599189A (en) Dynamic Skyline inquiry device based on cloud computing
CN115017159A (en) Data processing method and device, storage medium and electronic equipment
CN105610992A (en) Task allocation load balancing method for distributed stream computing system
CN105471893A (en) Distributed equivalent data stream connection method
CN201726426U (en) Internet information monitoring system based on cloud computing
CN111858530B (en) Real-time correlation analysis method and system based on mass logs
CN101267449A (en) A tree P2P system resource transmission method based on mobile agent mechanism
CN110764833B (en) Task unloading method, device and system based on edge calculation
Aslam et al. Pre‐filtering based summarization for data partitioning in distributed stream processing
CN115422259A (en) Data processing method, system, equipment and storage medium of time sequence database
CN113505326B (en) Dynamic coding data transmission control method based on http protocol family
CN102521360B (en) Raster data transmission method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170620