CN106873945A - Data processing architecture and data processing method based on batch processing and Stream Processing - Google Patents
Data processing architecture and data processing method based on batch processing and Stream Processing Download PDFInfo
- Publication number
- CN106873945A CN106873945A CN201611245710.6A CN201611245710A CN106873945A CN 106873945 A CN106873945 A CN 106873945A CN 201611245710 A CN201611245710 A CN 201611245710A CN 106873945 A CN106873945 A CN 106873945A
- Authority
- CN
- China
- Prior art keywords
- data
- processing
- batch
- module
- view
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012545 processing Methods 0.000 title claims abstract description 208
- 238000003672 processing method Methods 0.000 title claims description 5
- 238000011282 treatment Methods 0.000 claims abstract description 36
- 230000007246 mechanism Effects 0.000 claims abstract description 19
- 238000012544 monitoring process Methods 0.000 claims abstract description 17
- 230000002688 persistence Effects 0.000 claims abstract description 16
- 238000013079 data visualisation Methods 0.000 claims abstract description 14
- 238000013480 data collection Methods 0.000 claims abstract description 12
- 238000007596 consolidation process Methods 0.000 claims abstract description 7
- 238000000034 method Methods 0.000 claims description 12
- 238000005516 engineering process Methods 0.000 claims description 5
- 235000013399 edible fruits Nutrition 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims description 2
- 238000003032 molecular docking Methods 0.000 claims 1
- 238000012546 transfer Methods 0.000 abstract 1
- 230000008569 process Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000010223 real-time analysis Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3005—Arrangements for executing specific machine instructions to perform operations for flow control
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The present invention relates to a kind of data processing architecture based on batch processing and Stream Processing, including:Data acquisition module, obtains the real time data of collection from multiple data collection stations, and the data transfer that will be gathered is to batch processing module and Stream Processing module;Batch processing module, the real time data to receiving carries out persistence treatment, and batch processing is carried out to the real time data processed through persistence using the mechanism of re-computation, and the result according to treatment generates varigrained batch view;Stream Processing module, carries out Stream Processing, and generate varigrained Stream Processing view according to the result for the treatment of for the real time data to receiving using the mechanism of incremental computations;Data combiners block, is merged using corresponding consolidation strategy to batch view, Stream Processing view;Data visualization module, is shown to the batch view after batch view, Stream Processing view or merging, Stream Processing view;Monitoring resource module, for carrying out monitoring resource with upper module.
Description
Technical field
The present invention relates to technical field of data processing, more particularly, to a kind of number based on batch processing and Stream Processing
According to processing framework and data processing method.
Background technology
With widely using for the equipment such as the popularization of internet, the fast development of Internet of Things and smart mobile phone so that people
Can whenever and wherever possible produce data, cause the explosive growth of data.For large-scale data, it has been proposed that distributed
Batch processing model and Stream Processing model.
Wherein, the height of the extensive historical data of batch processing model realization is handled up, magnanimity analysis and is excavated, after it is first stored
Calculate, it is often not high suitable for requirement of real-time, while the accuracy of data and comprehensive even more important scene, batch processing mould
Type is widely used in the fields such as off-line analysis, offline machine learning.And Stream Processing model is more concerned with streaming data
Real-time analysis, data reach in a streaming manner, carry bulk information, and the stream data of only fraction is stored in limited
Internal memory in.Stream Processing model is widely used in the field of the low delays such as online recommendation, on-line analysis, online machine learning
Jing Zhong.
However, the data processing mode of batch processing model and Stream Processing model is single, usage scenario is limited, they are all
For the solution that single problem and scene are proposed, versatility is not had between the two.Batch processing model can be processed
More comprehensive data and then more accurately result is obtained, but time delay is than larger.Carry out Stream Processing model energy low delay
Calculate, but only cached in internal memory causes computational accuracy than relatively low than relatively limited data.And with the development of science and technology modern enterprise
Industry has increasing need for a kind of method of low delay and processes historical data and real time data simultaneously.Both can guarantee that to whole data set
Overall treatment, can guarantee that the efficiency for the treatment of again.
The content of the invention
The present invention is the problem of solution above technology, there is provided a kind of data processing frame based on batch processing and Stream Processing
Structure, the framework possesses the ability of batch processing and Stream Processing, thus can while ensureing to carry out overall treatment to data set
Take into account the efficiency for the treatment of.
To realize above goal of the invention, the technical scheme of use is:
A kind of data processing architecture based on batch processing and Stream Processing, including at data acquisition module, batch processing module, streaming
Reason module, data combiners block, data visualization module and monitoring resource module;
Wherein data acquisition module is used to be obtained from multiple data collection stations the real time data of collection, and the data that will be gathered
Transmit to batch processing module and Stream Processing module;
The batch processing module is used to carry out persistence treatment to the real time data for receiving, and is then meeting execution batch processing condition
In the case of, batch processing is carried out to the real time data processed through persistence using the mechanism of re-computation, and according to the knot for the treatment of
Fruit generates varigrained batch view;
The Stream Processing module is used to carry out Stream Processing using the mechanism of incremental computations to the real time data for receiving, and according to
The result for the treatment of generates varigrained Stream Processing view;
The data combiners block is used for according to specific query demand, using corresponding consolidation strategy to batch view, streaming
Treatment view is merged;
The data visualization module is used for batch view, the streaming after batch view, Stream Processing view or merging
Treatment view is shown;
The monitoring resource module be used for data acquisition module, batch processing module, Stream Processing module, data combiners block,
Data visualization module carries out monitoring resource.
Preferably, the data acquisition module includes Data Collection submodule and data cleansing submodule, and the data are received
Collection submodule is used to receive the real time data for obtaining collection from multiple data collection stations, and the data cleansing submodule is used for
The real time data for receiving is cleaned using corresponding filtering rule.
Preferably, the batch processing module includes data prediction submodule, data processing submodule and batch view
Sub-module stored;
The data prediction submodule is used to use Data Integration, data converter technique, number to the real time data for receiving
Persistence treatment is carried out according to stipulations technology;
The data processing submodule meet perform batch processing condition in the case of, using the mechanism of re-computation to through persistence
The real time data for the treatment of carries out batch processing;
The batch view sub-module stored is used to be stored in the result that data processing submodule is obtained in Hbase,
To generate varigrained batch view.
Preferably, the Stream Processing module includes data processing submodule, Stream Processing view sub-module stored, wherein
The data processing submodule is used to carry out real time data Stream Processing using the mechanism of incremental computations, and the Stream Processing is regarded
Figure sub-module stored is used to be stored in Hbase the data processed result that data processing submodule is produced, to generate different grains
The Stream Processing view of degree.
Preferably, the data acquisition module is realized using Flume Log Collect Systems.
Preferably, the batch processing module is realized using Spark clusters.
Preferably, the Stream Processing module is realized using Storm clusters.
Meanwhile, present invention also offers a kind of data processing method based on above framework, its scheme specifically includes following
Step:
S1. data acquisition module is used to be obtained from multiple data collection stations the real time data of collection, and the data that will be gathered
Transmit to batch processing module and Stream Processing module;
S2. batch processing module carries out persistence treatment to the real time data for receiving, and is then meeting the feelings of execution batch processing condition
Under condition, batch processing is carried out to the real time data processed through persistence using the mechanism of re-computation, and according to the result life for the treatment of
Into varigrained batch view;
S3. Stream Processing module carries out Stream Processing to the real time data for receiving using the mechanism of incremental computations, and according to treatment
Result generate varigrained Stream Processing view;
S4. data combiners block is regarded using corresponding consolidation strategy according to specific query demand to batch view, Stream Processing
Figure is merged;
S5. data visualization module is to batch view, the Stream Processing after batch view, Stream Processing view or merging
View is shown;
S6. monitoring resource module merges to data acquisition module, batch processing module, Stream Processing module, data in above flow
Module, data visualization module carry out monitoring resource.
Compared with prior art, the beneficial effects of the invention are as follows:
The framework that the present invention is provided is arranged in pairs or groups and is used by by batch processing module, Stream Processing module, it is ensured that whole to calculate knot
The precision of fruit, while taking into account data-handling efficiency.
Brief description of the drawings
The structure chart of the framework that Fig. 1 is provided for the present invention.
Fig. 2 is the schematic diagram of data collection module.
Fig. 3 performs figure for the calculating task of Spark clusters.
Fig. 4 is the flow chart of incremental computations in Stream Processing module.
Fig. 5, Fig. 6, Fig. 7 are batch processing module and the synchronous schematic diagram of Stream Processing module data.
Fig. 8 is the schematic flow sheet that data combiners block performs data processing.
Specific embodiment
Accompanying drawing being for illustration only property explanation, it is impossible to be interpreted as the limitation to this patent;
Below in conjunction with drawings and Examples, the present invention is further elaborated.
Embodiment 1
Batch processing and the data processing architecture of Stream Processing, as shown in figure 1, including data acquisition module 10, batch processing module 20,
Stream Processing module 30, data combiners block 40, data visualization module 50 and monitoring resource module 60;
Wherein data acquisition module is used to be obtained from multiple data collection stations the real time data of collection, and the data that will be gathered
Transmit to batch processing module and Stream Processing module;
The batch processing module is used to carry out persistence treatment to the real time data for receiving, and is then meeting execution batch processing condition
In the case of, batch processing is carried out to the real time data processed through persistence using the mechanism of re-computation, and according to the knot for the treatment of
Fruit generates varigrained batch view;
The Stream Processing module is used to carry out Stream Processing using the mechanism of incremental computations to the real time data for receiving, and according to
The result for the treatment of generates varigrained Stream Processing view;
The data combiners block is used for according to specific query demand, using corresponding consolidation strategy to batch view, streaming
Treatment view is merged;
The data visualization module is used for batch view, the streaming after batch view, Stream Processing view or merging
Treatment view is shown;
The monitoring resource module be used for data acquisition module, batch processing module, Stream Processing module, data combiners block,
Data visualization module carries out monitoring resource.
In specific implementation process, the specific embodiment of data acquisition module 10 can be:Using distributed, Gao Ke
Real-time reception, such as Flume Log Collect Systems are carried out to multi-source data by the massive logs collection with High Availabitity and Transmission system.
As shown in Fig. 2 being provided with three agencies, respectively Agent1, Agent2 and Master Agent in the framework.Flume daily records
Acquisition system receives external data using two Source, and one is the Avro Source in Agent1, for monitoring an IP
And port numbers, another is the Spooldir in Agent2, for monitoring a catalogue.Enter by the real time data for collecting
After the preliminary data filtering of row, the Avro during the data received from two Source are issued Master Agent
Source.The framework using replication strategy the data received in Avro Source and meanwhile be sent to File Channel and
In Memory Channel, then data are eventually communicated in HDFS Sink and Kafka Sink, for batch processing and stream
Formula treatment.
As shown in figure 3, batch processing module 20 is realized using Spark clusters, building Spark first during realization should
With the running environment of program, then application program is submitted on Resource Scheduler, the resource needed for the application can be disposable
It is ready to, now belongs to coarseness constructing environment.Then application program is converted into DAG figures, Spark turns RDD dependences
Turn to different stage.Here dependence is divided into narrow dependence and dependence wide, and each subregion of father RDD can only be by one in narrow dependence
Individual sub- RDD multidomain treat-ments, and the father RDD that relies on wide can give many sub- RDD subregions.Spark tried one's best by greedy algorithm make it is narrow
Rely on and divide in a single stage, and the parallel processing multiple tasks in each stage.When performing DAG figures, it is first carried out disobeying
Rely the stage in other stages, rerun the dependence stage completed stage, it is the same with the Optimization Mechanism in MapReduce,
Spark can consider data locality and speculate execution mechanism.The result of batch processing module is stored in Hbase, to generate
Varigrained batch view, the result of batch processing module, batch view are stored in Hbase primarily to propping up
Hold random read-write.
In specific implementation process, Stream Processing module 30 is realized using Storm clusters, and its concrete function is sketched such as
Under:
In Storm clusters, a real-time application is designed to a Topology, and Topology is submitted into cluster,
Code is distributed by the main controlled node in cluster, working node execution is assigned the task to.One Topology include spout and
Two kinds of roles of bolt, wherein spout sends message, is responsible in the form of tuples sending data flow;And bolt is then responsible for
Transmitting data flow, the operation such as can complete to calculate, filter in bolt, and bolt itself can also at random send the data to other
bolt.The wherein result of Stream Processing module 30 and the view of generation is all stored in Hbase, when being reached so as to new data
Operation can be updated with low delay.
Meanwhile, in order to improve the treatment effeciency of data, Stream Processing module 30 can use the mechanism of incremental computations, specifically
Process is summarized as follows:As shown in figure 4, when Stream Processing module has new data to reach, can first determine whether whether the data can shadows
Ring to data with existing;If new data has influence on data with existing, data with existing is taken out from Hbase, and and new data
Merge;If new data does not interfere with data with existing, do not process;The result of above-mentioned steps is counted as new
According to, take corresponding algorithm to calculate new data, then the corresponding RUNTIME VIEW of generation new data regards in real time by what is generated
Figure is updated in existing Stream Processing view.
In specific implementation process, in order to ensure that the data for flowing into batch processing module and Stream Processing module are only processed
Once, it is necessary to consider the data synchronization problems between batch processing module and Stream Processing module, its process is as follows:
The data that batch processing module and Stream Processing module are collected simultaneously, batch processing module saves the data in HDFS
On, Stream Processing module is saved the data in table, and table name current date and the data content for receiving are identified, by dynamic dimension
Two tables are protected to solve the problems, such as data syn-chronization.As shown in figure 5, system brings into operation after a period of time, batch processing module and stream
Formula processing module preserves identical data, but batch processing module does not arrive the time point of triggering re-computation, namely batch processing
The data of module are not calculated.Now, it is assumed that the table of Stream Processing module is i_click.
As shown in fig. 6, after the time point for having arrived batch processing module re-computation, the re-computation of batch processing module is triggered, batch
Processing module can again build a table according to the current time in system before re-computation, for preserving real time data.Table name is i+
1_click.Assuming that the data received during re-computation are block1 and block2, then now in Stream Processing module altogether
Two tables are deposited, one is i_click, and one is i+1_click.What i_click was preserved is the real time data for receiving for i-th day, i
What is preserved in+1_click is the i+1 days new real time datas for receiving, and that is to say block1 and block2.
As shown in fig. 7, being the result after system carries out data syn-chronization.Batch processing module can delete table after re-computation is carried out
I_click, now Stream Processing module there was only the data in i+1_click tables.Because now the data in i_click exist
Calculated in batch processing module, so Stream Processing module no longer calculates this partial data, will otherwise cause the re-computation of data.
In specific implementation process, the specific embodiment of data combiners block 40 can be:For the specific of user
Business demand, merges the result of calculation of batch processing module and Stream Processing module, so as to realize the inquiry on whole data set.
Therefore its key point is how to merge the batch view that batch processing module calculates and the reality that Stream Processing module is calculated
When view, then according to specific service logic, select corresponding consolidation strategy.If query function meets Monoid characteristics,
Meet Percentage bound, directly can merge batch view and Stream Processing view result.If as shown in figure 8, will
Inquiry first determines whether the span of input time section in the click volume of different time sections article, if it is completely in batch processing mould
Block, then need to only inquire about from batch view and obtain corresponding result;If its completely in Stream Processing module, only need to from
Inquiry obtains corresponding result in Stream Processing view;If it needs across in batch processing module and Stream Processing module
Inquired about from batch view and Stream Processing view respectively, then merge Query Result, namely to identical items
Purchase volume is simply added.If query function is unsatisfactory for Monoid characteristics, query function can be converted to multiple full
The query function of sufficient Monoid characteristics carries out computing, for single each query function respectively from batch view and streaming
Query Result in reason view, then carries out correlation computations and obtains final required result again.
Obviously, the above embodiment of the present invention is only intended to clearly illustrate example of the present invention, and is not right
The restriction of embodiments of the present invention.For those of ordinary skill in the field, may be used also on the basis of the above description
To make other changes in different forms.There is no need and unable to be exhaustive to all of implementation method.It is all this
Any modification, equivalent and improvement made within the spirit and principle of invention etc., should be included in the claims in the present invention
Protection domain within.
Claims (8)
1. a kind of data processing architecture based on batch processing and Stream Processing, it is characterised in that:Including data acquisition module, batch at
Reason module, Stream Processing module, data combiners block, data visualization module and monitoring resource module;
Wherein data acquisition module is used to be obtained from multiple data collection stations the real time data of collection, and the data that will be gathered
Transmit to batch processing module and Stream Processing module;
The batch processing module is used to carry out persistence treatment to the real time data for receiving, and is then meeting execution batch processing condition
In the case of, batch processing is carried out to the real time data processed through persistence using the mechanism of re-computation, and according to the knot for the treatment of
Fruit generates varigrained batch view;
The Stream Processing module is used to carry out Stream Processing using the mechanism of incremental computations to the real time data for receiving, and according to
The result for the treatment of generates varigrained Stream Processing view;
The data combiners block is used for according to specific query demand, using corresponding consolidation strategy to batch view, streaming
Treatment view is merged;
The data visualization module is used for batch view, the streaming after batch view, Stream Processing view or merging
Treatment view is shown;
The monitoring resource module be used for data acquisition module, batch processing module, Stream Processing module, data combiners block,
Data visualization module carries out monitoring resource.
2. the data processing architecture based on batch processing and Stream Processing according to claim 1, it is characterised in that:The number
Include Data Collection submodule and data cleansing submodule according to acquisition module, the Data Collection submodule is used to receive from multiple
The real time data of collection is obtained in data collection station, the data cleansing submodule is used for using the docking of corresponding filtering rule
The real time data of receipts is cleaned.
3. the data processing architecture based on batch processing and Stream Processing according to claim 1, it is characterised in that:Described batch
Processing module includes data prediction submodule, data processing submodule and batch view sub-module stored;
The data prediction submodule is used to use Data Integration, data converter technique, number to the real time data for receiving
Persistence treatment is carried out according to stipulations technology;
The data processing submodule meet perform batch processing condition in the case of, using the mechanism of re-computation to through persistence
The real time data for the treatment of carries out batch processing;
The batch view sub-module stored is used to be stored in the result that data processing submodule is obtained in Hbase,
To generate varigrained batch view.
4. the data processing architecture based on batch processing and Stream Processing according to claim 1, it is characterised in that:The stream
Formula processing module includes data processing submodule, Stream Processing view sub-module stored, wherein the data processing submodule is used
In Stream Processing is carried out to real time data using the mechanism of incremental computations, the Stream Processing view sub-module stored is used for logarithm
The data processed result produced according to treatment submodule is stored in Hbase, to generate varigrained Stream Processing view.
5. the data processing architecture based on batch processing and Stream Processing according to claim 2, it is characterised in that:The number
Realized using Flume Log Collect Systems according to acquisition module.
6. the data processing architecture based on batch processing and Stream Processing according to claim 3, it is characterised in that:Described batch
Processing module is realized using Spark clusters.
7. the data processing architecture based on batch processing and Stream Processing according to claim 4, it is characterised in that:The stream
Formula processing module is realized using Storm clusters.
8. the data processing method of one kind framework according to above any one of claim 1 ~ 7, it is characterised in that:Including following
Step:
S1. data acquisition module is used to be obtained from multiple data collection stations the real time data of collection, and the data that will be gathered
Transmit to batch processing module and Stream Processing module;
S2. batch processing module carries out persistence treatment to the real time data for receiving, and is then meeting the feelings of execution batch processing condition
Under condition, batch processing is carried out to the real time data processed through persistence using the mechanism of re-computation, and according to the result life for the treatment of
Into varigrained batch view;
S3. Stream Processing module carries out Stream Processing to the real time data for receiving using the mechanism of incremental computations, and according to treatment
Result generate varigrained Stream Processing view;
S4. data combiners block is regarded using corresponding consolidation strategy according to specific query demand to batch view, Stream Processing
Figure is merged;
S5. data visualization module is to batch view, the Stream Processing after batch view, Stream Processing view or merging
View is shown;
S6. monitoring resource module merges to data acquisition module, batch processing module, Stream Processing module, data in above flow
Module, data visualization module carry out monitoring resource.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611245710.6A CN106873945A (en) | 2016-12-29 | 2016-12-29 | Data processing architecture and data processing method based on batch processing and Stream Processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611245710.6A CN106873945A (en) | 2016-12-29 | 2016-12-29 | Data processing architecture and data processing method based on batch processing and Stream Processing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106873945A true CN106873945A (en) | 2017-06-20 |
Family
ID=59164023
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611245710.6A Pending CN106873945A (en) | 2016-12-29 | 2016-12-29 | Data processing architecture and data processing method based on batch processing and Stream Processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106873945A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107391719A (en) * | 2017-07-31 | 2017-11-24 | 南京邮电大学 | Distributed stream data processing method and system in a kind of cloud environment |
CN107908797A (en) * | 2017-12-18 | 2018-04-13 | 上海中畅数据技术有限公司 | A kind of ETL data stream treatment technology method and systems in real time |
CN108304454A (en) * | 2017-11-27 | 2018-07-20 | 大象慧云信息技术有限公司 | Invoice data real time aggregation device based on big data |
CN108718345A (en) * | 2018-09-05 | 2018-10-30 | 电子科技大学 | A kind of digitlization workshop industrial data Network Transmitting system |
CN109598348A (en) * | 2017-09-28 | 2019-04-09 | 北京猎户星空科技有限公司 | A kind of image pattern obtains, model training method and system |
CN109684377A (en) * | 2018-12-13 | 2019-04-26 | 深圳市思迪信息技术股份有限公司 | General big data handles development platform and its data processing method in real time |
CN111079924A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, system and related product |
CN111211993A (en) * | 2018-11-21 | 2020-05-29 | 百度在线网络技术(北京)有限公司 | Incremental persistence method and device for streaming computation |
CN112527839A (en) * | 2020-12-10 | 2021-03-19 | 上海浦东发展银行股份有限公司 | Multi-source data processing method, system, equipment and storage medium |
CN112597200A (en) * | 2020-12-22 | 2021-04-02 | 南京三眼精灵信息技术有限公司 | Batch and streaming combined data processing method and device |
CN114816704A (en) * | 2022-04-25 | 2022-07-29 | 湖南大学 | Spark task scheduling method and system based on heterogeneous resources |
WO2023109806A1 (en) * | 2021-12-14 | 2023-06-22 | 天翼物联科技有限公司 | Method and apparatus for processing active data for internet of things device, and storage medium |
CN116841753A (en) * | 2023-08-31 | 2023-10-03 | 杭州迅杭科技有限公司 | Stream processing and batch processing switching method and switching device |
CN117787902A (en) * | 2023-12-26 | 2024-03-29 | 航天神舟智慧系统技术有限公司 | Flow batch integration-based distribution control early warning system and method |
CN114816704B (en) * | 2022-04-25 | 2024-10-15 | 湖南大学 | Spark task scheduling method and system based on heterogeneous resources |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013097234A1 (en) * | 2011-12-31 | 2013-07-04 | 华为技术有限公司 | Service processing method and system |
CN105677752A (en) * | 2015-12-30 | 2016-06-15 | 深圳先进技术研究院 | Streaming computing and batch computing combined processing system and method |
CN105701161A (en) * | 2015-12-31 | 2016-06-22 | 深圳先进技术研究院 | Real-time big data user label system |
-
2016
- 2016-12-29 CN CN201611245710.6A patent/CN106873945A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013097234A1 (en) * | 2011-12-31 | 2013-07-04 | 华为技术有限公司 | Service processing method and system |
CN105677752A (en) * | 2015-12-30 | 2016-06-15 | 深圳先进技术研究院 | Streaming computing and batch computing combined processing system and method |
CN105701161A (en) * | 2015-12-31 | 2016-06-22 | 深圳先进技术研究院 | Real-time big data user label system |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107391719A (en) * | 2017-07-31 | 2017-11-24 | 南京邮电大学 | Distributed stream data processing method and system in a kind of cloud environment |
CN109598348A (en) * | 2017-09-28 | 2019-04-09 | 北京猎户星空科技有限公司 | A kind of image pattern obtains, model training method and system |
CN108304454B (en) * | 2017-11-27 | 2022-05-17 | 大象慧云信息技术有限公司 | Invoice data real-time aggregation device based on big data |
CN108304454A (en) * | 2017-11-27 | 2018-07-20 | 大象慧云信息技术有限公司 | Invoice data real time aggregation device based on big data |
CN107908797A (en) * | 2017-12-18 | 2018-04-13 | 上海中畅数据技术有限公司 | A kind of ETL data stream treatment technology method and systems in real time |
CN108718345A (en) * | 2018-09-05 | 2018-10-30 | 电子科技大学 | A kind of digitlization workshop industrial data Network Transmitting system |
CN111079924A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, system and related product |
CN111211993A (en) * | 2018-11-21 | 2020-05-29 | 百度在线网络技术(北京)有限公司 | Incremental persistence method and device for streaming computation |
CN111211993B (en) * | 2018-11-21 | 2023-08-11 | 百度在线网络技术(北京)有限公司 | Incremental persistence method, device and storage medium for stream computation |
CN109684377A (en) * | 2018-12-13 | 2019-04-26 | 深圳市思迪信息技术股份有限公司 | General big data handles development platform and its data processing method in real time |
CN112527839A (en) * | 2020-12-10 | 2021-03-19 | 上海浦东发展银行股份有限公司 | Multi-source data processing method, system, equipment and storage medium |
CN112597200A (en) * | 2020-12-22 | 2021-04-02 | 南京三眼精灵信息技术有限公司 | Batch and streaming combined data processing method and device |
CN112597200B (en) * | 2020-12-22 | 2024-01-12 | 南京三眼精灵信息技术有限公司 | Batch and stream combined data processing method and device |
WO2023109806A1 (en) * | 2021-12-14 | 2023-06-22 | 天翼物联科技有限公司 | Method and apparatus for processing active data for internet of things device, and storage medium |
CN114816704A (en) * | 2022-04-25 | 2022-07-29 | 湖南大学 | Spark task scheduling method and system based on heterogeneous resources |
CN114816704B (en) * | 2022-04-25 | 2024-10-15 | 湖南大学 | Spark task scheduling method and system based on heterogeneous resources |
CN116841753A (en) * | 2023-08-31 | 2023-10-03 | 杭州迅杭科技有限公司 | Stream processing and batch processing switching method and switching device |
CN116841753B (en) * | 2023-08-31 | 2023-11-17 | 杭州迅杭科技有限公司 | Stream processing and batch processing switching method and switching device |
CN117787902A (en) * | 2023-12-26 | 2024-03-29 | 航天神舟智慧系统技术有限公司 | Flow batch integration-based distribution control early warning system and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106873945A (en) | Data processing architecture and data processing method based on batch processing and Stream Processing | |
CN110460656B (en) | Industry environmental protection thing networking remote monitoring cloud platform | |
CN110022226B (en) | Object-oriented data acquisition system and acquisition method | |
CN106778033B (en) | A kind of Spark Streaming abnormal temperature data alarm method based on Spark platform | |
CN107679192A (en) | More cluster synergistic data processing method, system, storage medium and equipment | |
Qiu et al. | A packet buffer evaluation method exploiting queueing theory for wireless sensor networks | |
CN101902497B (en) | Cloud computing based internet information monitoring system and method | |
Liu et al. | Real-time complex event processing and analytics for smart grid | |
CN109739919A (en) | A kind of front end processor and acquisition system for electric system | |
CN106599190A (en) | Dynamic Skyline query method based on cloud computing | |
CN107086929A (en) | A kind of batch streaming computing system performance guarantee method based on modeling of queuing up | |
CN104394149A (en) | Complex event processing method based on parallel distributed architecture | |
CN107454009B (en) | Data center-oriented offline scene low-bandwidth overhead traffic scheduling scheme | |
CN106599189A (en) | Dynamic Skyline inquiry device based on cloud computing | |
CN115017159A (en) | Data processing method and device, storage medium and electronic equipment | |
CN105610992A (en) | Task allocation load balancing method for distributed stream computing system | |
CN105471893A (en) | Distributed equivalent data stream connection method | |
CN201726426U (en) | Internet information monitoring system based on cloud computing | |
CN111858530B (en) | Real-time correlation analysis method and system based on mass logs | |
CN101267449A (en) | A tree P2P system resource transmission method based on mobile agent mechanism | |
CN110764833B (en) | Task unloading method, device and system based on edge calculation | |
Aslam et al. | Pre‐filtering based summarization for data partitioning in distributed stream processing | |
CN115422259A (en) | Data processing method, system, equipment and storage medium of time sequence database | |
CN113505326B (en) | Dynamic coding data transmission control method based on http protocol family | |
CN102521360B (en) | Raster data transmission method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170620 |