CN107506482A - A kind of large-scale data processing unit and method based on Stream Processing framework - Google Patents
A kind of large-scale data processing unit and method based on Stream Processing framework Download PDFInfo
- Publication number
- CN107506482A CN107506482A CN201710835168.8A CN201710835168A CN107506482A CN 107506482 A CN107506482 A CN 107506482A CN 201710835168 A CN201710835168 A CN 201710835168A CN 107506482 A CN107506482 A CN 107506482A
- Authority
- CN
- China
- Prior art keywords
- data
- logic
- data processing
- processing
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of large-scale data processing unit and method based on flow processing framework, the device includes:Topology constructing module, for building data processing topology according to XML configuration file;Data read module, for being read from data source with markd initial data, and according to mark load logic configuration file, obtain the data of additional logic configuration;Data processing module, for receiving the data of additional logic configuration, dynamic call processing method, generate result and shunted;Aggregation module, for receiving the result of shunting and being polymerize;Memory module, for receiving polymerization result, and it is stored in specified storage medium.The present invention is based on Stream Processing framework, and data processing speed is fast, being capable of the newly-increased data of timely processing;Newly-increased processing rule configured in Redis can dynamic call, the inserted mode of data is various;Only need simple configuration modification to can be used under different scenes, there is certain application prospect.
Description
Technical field
The present invention relates to field of computer technology, more particularly to a kind of large-scale data processing based on Stream Processing framework
Device and method.
Background technology
At present, large-scale data typically uses the processing mode of multithreading list example, and this mode is normally operated in a clothes
It is engaged on device, it is high to the specific aim of business, but configure accordingly less.With the growth of data explosion type, traditional data processing
It is impossible to meet requirement of the large-scale data processing for speed, performance, major defect are as follows for mode:
1st, because single server is present as network stabilization is poor, the excessively high performance bottleneck of CPU usage, therefore, at data
It is not fast enough to manage speed, and is unable to timely processing and increases data newly.
When the 2nd, carrying out data processing, file is configured without to intervene processing procedure, processing rule is unable to dynamic configuration, once
Processing rule changes, it is necessary to restart program.
3rd, the sentence inserted in once running is fixed (such as SQL), it is impossible to dynamic is changed, and data inserted mode is single,
A variety of inserted modes are not supported.
4th, a data processing can only be used in a business scenario, and very high with the coupling of specific business, independence is poor,
Be inconvenient to migrate.
Therefore, a kind of large-scale data processing unit and method based on Stream Processing framework are needed badly.
The content of the invention
The purpose of invention:In order to solve technical problem present in background technology, there is provided one kind is based on Stream Processing framework
Large-scale data processing unit and method.
To reach above-mentioned purpose, the technical solution adopted by the present invention is:Provide a kind of based on the big of Stream Processing framework
Scale data processing unit, including:
Topology constructing module, for building data processing topology according to XML configuration file, while establish data processing topology
With the connection of data source, storage medium;
Data read module, for being read from data source with markd initial data, and according to corresponding to mark loading
Logical profile, obtain the data of additional logic configuration;The logical profile includes processing logic, processing method and deposited
Store up logic content;
Data processing module, for receiving the data of additional logic configuration, and the processing logic in being configured according to logic is moved
Processing method corresponding to state calling, generate result and simultaneously shunted according to storage logic;
Aggregation module, for receiving the result of shunting and it being polymerize, obtain polymerization result;
Memory module, for receiving polymerization result, and polymerization result is stored to the storage specified according to storage logic and is situated between
In matter.
Further, the data source is message-oriented middleware or persistent storage medium.
Further, the message-oriented middleware includes:For caching the Kafka of initial data and being configured for cache logic
The Redis of file, the persistent storage medium include:Relational database Mysql and index Solr.
Further, the storage medium also includes:Mongodb.
Further, the storage logic includes:Database instance, table name, inserted mode and insertion field.
Present invention also offers a kind of large-scale data processing method based on Stream Processing framework, including procedure below:
Step 1:Topology constructing module builds data processing topology according to XML configuration file, while establishes data processing and open up
Flutter and the connection of data source, storage medium;
Step 2:Data read module is read with markd initial data from data source, and according to corresponding to mark loading
Logical profile, the Data Concurrent for obtaining additional logic configuration give data processing module;The logical profile includes
Handle logic, processing method and storage logic content;
Step 3:Data processing module receives the data of additional logic configuration, and the processing logic in being configured according to logic is moved
Processing method corresponding to state calling, generate result and simultaneously shunted according to storage logic;
Step 4:Aggregation module receives the result of shunting and it is polymerize, and obtains polymerization result and be sent to deposit
Store up module;
Step 5:Memory module receives polymerization result, and is stored polymerization result to the storage specified according to storage logic and be situated between
In matter.
Further, the data source is message-oriented middleware or persistent storage medium.
Further, the message-oriented middleware includes:For caching the Kafka of initial data and being configured for cache logic
The Redis of file, the persistent storage medium include:Relational database Mysql and index Solr.
Further, the storage medium also includes:Mongodb.
Further, the storage logic includes:Database instance, table name, inserted mode and insertion field.
The beneficial effects of the invention are as follows:The present invention, can be in more services based on Stream Processing frameworks such as storm, spark
Cluster is disposed on device, data processing speed is fast, and being capable of the newly-increased data of timely processing;Newly-increased processing rule is matched somebody with somebody in Redis
Put can dynamic call, without reset routine;The data source of use can be expanded laterally, and the inserted mode of data is various;In difference
It is i.e. usable using the present invention only to need to carry out simple configuration modification under scene, there is certain application prospect.
Brief description of the drawings
Fig. 1 is the structured flowchart of large-scale data processing unit of the embodiment of the present invention 1 based on Stream Processing framework.
Fig. 2 is the broad flow diagram of large-scale data processing method of the embodiment of the present invention 2 based on Stream Processing framework.
Embodiment
For make present invention solves the technical problem that, the technical scheme that uses and the technique effect that reaches it is clearer, below
The present invention is described in further detail in conjunction with the accompanying drawings and embodiments.It is understood that specific implementation described herein
Example is used only for explaining the present invention, rather than limitation of the invention.
Embodiment 1
Environmental preparation:Large-scale data general processing unit of the present embodiment based on Stream Processing framework, dependent on streaming
Framework is handled, Kafka and Redis message-oriented middlewares, the also medium for data storage, this is using Storm as bottom
Handle framework.Before the device is disposed, it need to ensure that these environment are already prepared to.
Configuration prepares:When building topological, the Storm relevant configurations in XML configuration file and Redis are loaded, it is determined that
The module of loading and Storm operational factor are (such as:Storm work numbers and task numbers);In building topology and data source, storage
During the connection of medium, Properties configuration files are loaded;When carrying out data processing, the configuration text in dynamic load Redis
Part, the threshold value of real time modifying operation is (such as:Batch submits data volume size, waits time-out time, data processing rule etc.).
Reference picture 1, large-scale data processing unit of the present embodiment based on Stream Processing framework, including:
Topology constructing module, for building data processing topology on storm according to XML file, and establish data processing
The connection of topology and data source, all storage mediums;The module is the basis of follow-up several module operations;
Data read module, for being read from data source with markd initial data, and according to corresponding to mark loading
Logical profile, obtain the data of additional logic configuration;The data source is message-oriented middleware or persistent storage medium;Institute
Stating message-oriented middleware includes:It is described to hold for caching the Kafka of initial data and Redis for cache logic configuration file
Longization storage medium includes:Relational database Mysql and index Solr;The mark is for distinction processing method
Field, can be table name, source website address of source database etc.;The logical profile includes processing logic, processing side
Method and storage logic content;
Data processing module, for receiving the data of additional logic configuration, and the processing logic in being configured according to logic is moved
Processing method corresponding to state calling is (such as:Place corresponding to the data in MongoDB in Student tables is configured with configuration file
Reason method is parseStudent (), then module meeting dynamic call this method handles the data), generation result is simultaneously
Shunted (such as according to storage logic:Need to insert same Mysql tables, and the field identical data inserted can be sent
To same stream);Because original data type and parsing require uncertain, the specific logic that handles needs voluntarily to write, together
When different processing logic corresponding to different types of data is specified in logical profile;The storage logic content includes:
Database instance, table name, inserted mode and insertion field, former three need to specify.
Aggregation module, for receive shunting result, and by it is identical storage logic result be aggregated to it is same
Individual thread, for there are the data of priority flag, memory module is transmitted directly to, otherwise by data buffer storage, is configured until satisfaction
Time-out time or quantity retransmit;
Memory module, for receiving polymerization result by batch, and polymerization result is stored and (inserted) according to storage logic
Into specified storage medium;The storage medium is relational database Mysql, distribution type file data storage storehouse Mongodb
With index Solr.
Large-scale data processing unit of the present embodiment based on Stream Processing framework, is broadly divided into 5 modules, passes through
These block coupled in series are formed a complete handling process by Storm stream mechanism, are loaded one on startup and global are matched somebody with somebody
Put, module, the Thread Count of modules and the source of initial data of loading are specified in the configuration.The present apparatus has following excellent
Point:
(1) data processing speed is fast.The present apparatus is based on Storm, Spark, Samza or Jstorm Stream Processing framework, can
To dispose cluster on multiple servers, can speed up processing, make full use of server performance, if desired for increase processing speed
Degree only needs to laterally increase server resource.
(2) ensure that data will not lose.Data are read from kafka, it is ensured that data are at least processed once.
(3) expansibility.Data source can be message-oriented middleware, or persistent storage database, can be horizontal
Expand, such as storage medium, it is existing to be stored for Mysql and solr, the insertion of the databases such as Mongodb can also be laterally increased.
(4) can dynamic loading processing logic.Newly-increased processing logic only needs in Redis configuration can be by dynamic call
Arrive.
(5) can dynamic load topological structure.The bolt of processing data is loaded by way of xml, in different pieces of information processing
In can selectively load, reduce server stress.
(6) data aggregate mechanism is used, data is inserted in batches, reduces network overhead.
(7) the whole handling process of data is all determined by configuration file, in different business, it is only necessary to modification configuration text
Part is with regard to that can meet process demand.
Embodiment 2
Reference picture 2, large-scale data processing method of the present embodiment based on Stream Processing framework, including procedure below:
Step 1:Topology constructing module builds data processing topology according to XML configuration file on Stream Processing framework, together
The connection of Shi Jianli data processings topology and data source, storage medium;
Step 2:Data read module is read with markd initial data from data source, and according to corresponding to mark loading
Logical profile, the Data Concurrent for obtaining additional logic configuration give data processing module;The logical profile includes
Handle logic, processing method and storage logic content;
Step 3:Data processing module receives the data of additional logic configuration, and the processing logic in being configured according to logic is moved
Processing method corresponding to state calling, generate result and simultaneously shunted according to storage logic;
Step 4:Aggregation module receives the result of shunting and it is polymerize, and obtains polymerization result and be sent to deposit
Store up module;
Step 5:Memory module receives polymerization result, and is stored polymerization result to the storage specified according to storage logic and be situated between
In matter.
Preferably, the data source is message-oriented middleware or persistent storage medium.
Preferably, the message-oriented middleware includes:For caching the Kafka of initial data and configuring text for cache logic
The Redis of part, the persistent storage medium include:Relational database Mysql, distribution type file data storage storehouse Mongodb
With index Solr.
Preferably, the storage logic includes:Database instance, table name, inserted mode and insertion field.
Pay attention to, the above is only presently preferred embodiments of the present invention.It will be appreciated by those skilled in the art that the invention is not restricted to
Specific embodiment described here, it can carry out various significantly changing, readjust and replacing for a person skilled in the art
In generation, is without departing from protection scope of the present invention.Therefore, although having been carried out by above example to the present invention more detailed
Illustrate, but the present invention is not limited only to above example, without departing from the inventive concept, can also include more
Other equivalent embodiments, and the scope of the present invention is determined by scope of the appended claims.
Claims (10)
- A kind of 1. large-scale data processing unit based on Stream Processing framework, it is characterised in that including:Topology constructing module, for building data processing topology according to XML configuration file, while establish data processing topology and number According to source, the connection of storage medium;Data read module, for being read from data source with markd initial data, and the logic according to corresponding to mark loading Configuration file, obtain the data of additional logic configuration;The logical profile includes processing logic, processing method and storage and patrolled Collect content;Data processing module, for receiving the data of additional logic configuration, and the processing logic dynamic in being configured according to logic is adjusted With corresponding processing method, generate result and shunted according to storage logic;Aggregation module, for receiving the result of shunting and it being polymerize, obtain polymerization result;Memory module, polymerization result is stored into specified storage medium for receiving polymerization result, and according to storage logic.
- 2. the large-scale data processing unit according to claim 1 based on Stream Processing framework, it is characterised in that described Data source is message-oriented middleware or persistent storage medium.
- 3. the large-scale data processing unit according to claim 2 based on Stream Processing framework, it is characterised in that described Message-oriented middleware includes:For caching the Kafka of initial data and Redis for cache logic configuration file, it is described persistently Changing storage medium includes:Relational database Mysql and index Solr.
- 4. the large-scale data processing unit according to claim 1 based on Stream Processing framework, it is characterised in that described Storage medium also includes:Mongodb.
- 5. the large-scale data processing unit according to claim 1 based on Stream Processing framework, it is characterised in that described Storage logic includes:Database instance, table name, inserted mode and insertion field.
- 6. a kind of large-scale data processing method based on Stream Processing framework, it is characterised in that including procedure below:Step 1:Topology constructing module according to XML configuration file build data processing topology, while establish data processing topology with The connection of data source, storage medium;Step 2:Data read module is read with markd initial data from data source, and the logic according to corresponding to mark loading Configuration file, the Data Concurrent for obtaining additional logic configuration give data processing module;The logical profile includes processing Logic, processing method and storage logic content;Step 3:Data processing module receives the data of additional logic configuration, and the processing logic dynamic in being configured according to logic is adjusted With corresponding processing method, generate result and shunted according to storage logic;Step 4:Aggregation module receives the result of shunting and it is polymerize, and obtains polymerization result and is sent to storage mould Block;Step 5:Memory module receives polymerization result, and is stored polymerization result to the storage medium specified according to storage logic In.
- 7. the large-scale data processing method according to claim 6 based on Stream Processing framework, it is characterised in that described Data source is message-oriented middleware or persistent storage medium.
- 8. the large-scale data processing method according to claim 6 based on Stream Processing framework, it is characterised in that described Message-oriented middleware includes:For caching the Kafka of initial data and Redis for cache logic configuration file, it is described persistently Changing storage medium includes:Relational database Mysql and index Solr.
- 9. the large-scale data processing method according to claim 8 based on Stream Processing framework, it is characterised in that described Storage medium also includes:Mongodb.
- 10. the large-scale data processing unit according to claim 6 based on Stream Processing framework, it is characterised in that institute Stating storage logic includes:Database instance, table name, inserted mode and insertion field.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2017104911873 | 2017-06-26 | ||
CN201710491187.3A CN107229747A (en) | 2017-06-26 | 2017-06-26 | A kind of large-scale data processing unit and method based on Stream Processing framework |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107506482A true CN107506482A (en) | 2017-12-22 |
Family
ID=59935281
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710491187.3A Pending CN107229747A (en) | 2017-06-26 | 2017-06-26 | A kind of large-scale data processing unit and method based on Stream Processing framework |
CN201710835168.8A Pending CN107506482A (en) | 2017-06-26 | 2017-09-15 | A kind of large-scale data processing unit and method based on Stream Processing framework |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710491187.3A Pending CN107229747A (en) | 2017-06-26 | 2017-06-26 | A kind of large-scale data processing unit and method based on Stream Processing framework |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN107229747A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109857524A (en) * | 2019-01-25 | 2019-06-07 | 深圳前海微众银行股份有限公司 | Streaming computing method, apparatus, equipment and computer readable storage medium |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107678852B (en) * | 2017-10-26 | 2021-06-22 | 携程旅游网络技术(上海)有限公司 | Method, system, equipment and storage medium based on stream data real-time calculation |
CN110007967B (en) * | 2017-12-29 | 2022-05-06 | 杭州海康威视数字技术股份有限公司 | Data processing method, device and equipment based on streaming framework |
CN108334380A (en) * | 2018-01-19 | 2018-07-27 | 新智云数据服务有限公司 | A kind of configuration item management method, device, terminal and computer readable storage medium |
CN108958789B (en) * | 2018-05-20 | 2021-07-09 | 湖北九州云仓科技发展有限公司 | Parallel stream type computing method, electronic equipment, storage medium and system |
CN109145023B (en) * | 2018-08-30 | 2020-11-27 | 北京百度网讯科技有限公司 | Method and apparatus for processing data |
CN112181522A (en) * | 2020-09-28 | 2021-01-05 | 亚信科技(中国)有限公司 | Data processing method and device and electronic equipment |
CN113438124B (en) * | 2021-06-07 | 2022-05-06 | 清华大学 | Network measurement method and device based on intention driving |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104050261A (en) * | 2014-06-16 | 2014-09-17 | 深圳先进技术研究院 | Stormed-based variable logic general data processing system and method |
CN105512162A (en) * | 2015-09-28 | 2016-04-20 | 杭州圆橙科技有限公司 | Real-time intelligent processing framework based on storm streaming data |
CN105574082A (en) * | 2015-12-08 | 2016-05-11 | 曙光信息产业(北京)有限公司 | Storm based stream processing method and system |
CN105959151A (en) * | 2016-06-22 | 2016-09-21 | 中国工商银行股份有限公司 | High availability stream processing system and method |
CN106599120A (en) * | 2016-12-01 | 2017-04-26 | 中国联合网络通信集团有限公司 | Stream processing framework-based data processing method and apparatus |
CN106789377A (en) * | 2017-03-24 | 2017-05-31 | 聚好看科技股份有限公司 | The service parameter update method of network element cluster |
-
2017
- 2017-06-26 CN CN201710491187.3A patent/CN107229747A/en active Pending
- 2017-09-15 CN CN201710835168.8A patent/CN107506482A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104050261A (en) * | 2014-06-16 | 2014-09-17 | 深圳先进技术研究院 | Stormed-based variable logic general data processing system and method |
CN105512162A (en) * | 2015-09-28 | 2016-04-20 | 杭州圆橙科技有限公司 | Real-time intelligent processing framework based on storm streaming data |
CN105574082A (en) * | 2015-12-08 | 2016-05-11 | 曙光信息产业(北京)有限公司 | Storm based stream processing method and system |
CN105959151A (en) * | 2016-06-22 | 2016-09-21 | 中国工商银行股份有限公司 | High availability stream processing system and method |
CN106599120A (en) * | 2016-12-01 | 2017-04-26 | 中国联合网络通信集团有限公司 | Stream processing framework-based data processing method and apparatus |
CN106789377A (en) * | 2017-03-24 | 2017-05-31 | 聚好看科技股份有限公司 | The service parameter update method of network element cluster |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109857524A (en) * | 2019-01-25 | 2019-06-07 | 深圳前海微众银行股份有限公司 | Streaming computing method, apparatus, equipment and computer readable storage medium |
CN109857524B (en) * | 2019-01-25 | 2024-02-27 | 深圳前海微众银行股份有限公司 | Stream computing method, device, equipment and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN107229747A (en) | 2017-10-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107506482A (en) | A kind of large-scale data processing unit and method based on Stream Processing framework | |
CN112860695B (en) | Monitoring data query method, device, equipment, storage medium and program product | |
CN102930062B (en) | The method of the quick horizontal extension of a kind of database | |
US20140156586A1 (en) | Big-fast data connector between in-memory database system and data warehouse system | |
CN106250226B (en) | Method for scheduling task and system based on consistency hash algorithm | |
JP7313382B2 (en) | Frequent Pattern Analysis of Distributed Systems | |
CN107436813A (en) | A kind of method and system of meta data server dynamic load leveling | |
CN110674152B (en) | Data synchronization method and device, storage medium and electronic equipment | |
WO2017028394A1 (en) | Example-based distributed data recovery method and apparatus | |
TWI686703B (en) | Method and device for data storage and business processing | |
CN105550246A (en) | System and method for loading network picture under Android platform | |
CN107665246A (en) | Dynamic date migration method and chart database cluster based on chart database | |
CN111723161A (en) | Data processing method, device and equipment | |
CN107562804A (en) | Data buffer service system and method, terminal | |
US20080208920A1 (en) | Efficient detection of deleted objects against a stateless content directory service | |
CN105872082B (en) | Fine granularity resource response system based on container cluster load-balancing algorithm | |
CN107203437A (en) | The methods, devices and systems for preventing internal storage data from losing | |
US20230342062A1 (en) | Live data migration in document stores | |
CN106909460B (en) | Data buffering method, device and storage medium | |
CN105930104B (en) | Date storage method and device | |
CN110019085A (en) | A kind of distributed time series database based on HBase | |
CN109981726A (en) | A kind of distribution method of memory node, server and system | |
CN113010373B (en) | Data monitoring method and device, electronic equipment and storage medium | |
JP7375173B2 (en) | Data processing method and related equipment, and computer program | |
CN105389368A (en) | Method for managing metadata of database cluster of MPP architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171222 |
|
RJ01 | Rejection of invention patent application after publication |