CN107506482A - A kind of large-scale data processing unit and method based on Stream Processing framework - Google Patents

A kind of large-scale data processing unit and method based on Stream Processing framework Download PDF

Info

Publication number
CN107506482A
CN107506482A CN201710835168.8A CN201710835168A CN107506482A CN 107506482 A CN107506482 A CN 107506482A CN 201710835168 A CN201710835168 A CN 201710835168A CN 107506482 A CN107506482 A CN 107506482A
Authority
CN
China
Prior art keywords
data
logic
data processing
processing
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710835168.8A
Other languages
Chinese (zh)
Inventor
王军
黄丽仪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Xinghan Shuzhi Technology Co Ltd
Original Assignee
Hunan Xinghan Shuzhi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Xinghan Shuzhi Technology Co Ltd filed Critical Hunan Xinghan Shuzhi Technology Co Ltd
Publication of CN107506482A publication Critical patent/CN107506482A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of large-scale data processing unit and method based on flow processing framework, the device includes:Topology constructing module, for building data processing topology according to XML configuration file;Data read module, for being read from data source with markd initial data, and according to mark load logic configuration file, obtain the data of additional logic configuration;Data processing module, for receiving the data of additional logic configuration, dynamic call processing method, generate result and shunted;Aggregation module, for receiving the result of shunting and being polymerize;Memory module, for receiving polymerization result, and it is stored in specified storage medium.The present invention is based on Stream Processing framework, and data processing speed is fast, being capable of the newly-increased data of timely processing;Newly-increased processing rule configured in Redis can dynamic call, the inserted mode of data is various;Only need simple configuration modification to can be used under different scenes, there is certain application prospect.

Description

A kind of large-scale data processing unit and method based on Stream Processing framework
Technical field
The present invention relates to field of computer technology, more particularly to a kind of large-scale data processing based on Stream Processing framework Device and method.
Background technology
At present, large-scale data typically uses the processing mode of multithreading list example, and this mode is normally operated in a clothes It is engaged on device, it is high to the specific aim of business, but configure accordingly less.With the growth of data explosion type, traditional data processing It is impossible to meet requirement of the large-scale data processing for speed, performance, major defect are as follows for mode:
1st, because single server is present as network stabilization is poor, the excessively high performance bottleneck of CPU usage, therefore, at data It is not fast enough to manage speed, and is unable to timely processing and increases data newly.
When the 2nd, carrying out data processing, file is configured without to intervene processing procedure, processing rule is unable to dynamic configuration, once Processing rule changes, it is necessary to restart program.
3rd, the sentence inserted in once running is fixed (such as SQL), it is impossible to dynamic is changed, and data inserted mode is single, A variety of inserted modes are not supported.
4th, a data processing can only be used in a business scenario, and very high with the coupling of specific business, independence is poor, Be inconvenient to migrate.
Therefore, a kind of large-scale data processing unit and method based on Stream Processing framework are needed badly.
The content of the invention
The purpose of invention:In order to solve technical problem present in background technology, there is provided one kind is based on Stream Processing framework Large-scale data processing unit and method.
To reach above-mentioned purpose, the technical solution adopted by the present invention is:Provide a kind of based on the big of Stream Processing framework Scale data processing unit, including:
Topology constructing module, for building data processing topology according to XML configuration file, while establish data processing topology With the connection of data source, storage medium;
Data read module, for being read from data source with markd initial data, and according to corresponding to mark loading Logical profile, obtain the data of additional logic configuration;The logical profile includes processing logic, processing method and deposited Store up logic content;
Data processing module, for receiving the data of additional logic configuration, and the processing logic in being configured according to logic is moved Processing method corresponding to state calling, generate result and simultaneously shunted according to storage logic;
Aggregation module, for receiving the result of shunting and it being polymerize, obtain polymerization result;
Memory module, for receiving polymerization result, and polymerization result is stored to the storage specified according to storage logic and is situated between In matter.
Further, the data source is message-oriented middleware or persistent storage medium.
Further, the message-oriented middleware includes:For caching the Kafka of initial data and being configured for cache logic The Redis of file, the persistent storage medium include:Relational database Mysql and index Solr.
Further, the storage medium also includes:Mongodb.
Further, the storage logic includes:Database instance, table name, inserted mode and insertion field.
Present invention also offers a kind of large-scale data processing method based on Stream Processing framework, including procedure below:
Step 1:Topology constructing module builds data processing topology according to XML configuration file, while establishes data processing and open up Flutter and the connection of data source, storage medium;
Step 2:Data read module is read with markd initial data from data source, and according to corresponding to mark loading Logical profile, the Data Concurrent for obtaining additional logic configuration give data processing module;The logical profile includes Handle logic, processing method and storage logic content;
Step 3:Data processing module receives the data of additional logic configuration, and the processing logic in being configured according to logic is moved Processing method corresponding to state calling, generate result and simultaneously shunted according to storage logic;
Step 4:Aggregation module receives the result of shunting and it is polymerize, and obtains polymerization result and be sent to deposit Store up module;
Step 5:Memory module receives polymerization result, and is stored polymerization result to the storage specified according to storage logic and be situated between In matter.
Further, the data source is message-oriented middleware or persistent storage medium.
Further, the message-oriented middleware includes:For caching the Kafka of initial data and being configured for cache logic The Redis of file, the persistent storage medium include:Relational database Mysql and index Solr.
Further, the storage medium also includes:Mongodb.
Further, the storage logic includes:Database instance, table name, inserted mode and insertion field.
The beneficial effects of the invention are as follows:The present invention, can be in more services based on Stream Processing frameworks such as storm, spark Cluster is disposed on device, data processing speed is fast, and being capable of the newly-increased data of timely processing;Newly-increased processing rule is matched somebody with somebody in Redis Put can dynamic call, without reset routine;The data source of use can be expanded laterally, and the inserted mode of data is various;In difference It is i.e. usable using the present invention only to need to carry out simple configuration modification under scene, there is certain application prospect.
Brief description of the drawings
Fig. 1 is the structured flowchart of large-scale data processing unit of the embodiment of the present invention 1 based on Stream Processing framework.
Fig. 2 is the broad flow diagram of large-scale data processing method of the embodiment of the present invention 2 based on Stream Processing framework.
Embodiment
For make present invention solves the technical problem that, the technical scheme that uses and the technique effect that reaches it is clearer, below The present invention is described in further detail in conjunction with the accompanying drawings and embodiments.It is understood that specific implementation described herein Example is used only for explaining the present invention, rather than limitation of the invention.
Embodiment 1
Environmental preparation:Large-scale data general processing unit of the present embodiment based on Stream Processing framework, dependent on streaming Framework is handled, Kafka and Redis message-oriented middlewares, the also medium for data storage, this is using Storm as bottom Handle framework.Before the device is disposed, it need to ensure that these environment are already prepared to.
Configuration prepares:When building topological, the Storm relevant configurations in XML configuration file and Redis are loaded, it is determined that The module of loading and Storm operational factor are (such as:Storm work numbers and task numbers);In building topology and data source, storage During the connection of medium, Properties configuration files are loaded;When carrying out data processing, the configuration text in dynamic load Redis Part, the threshold value of real time modifying operation is (such as:Batch submits data volume size, waits time-out time, data processing rule etc.).
Reference picture 1, large-scale data processing unit of the present embodiment based on Stream Processing framework, including:
Topology constructing module, for building data processing topology on storm according to XML file, and establish data processing The connection of topology and data source, all storage mediums;The module is the basis of follow-up several module operations;
Data read module, for being read from data source with markd initial data, and according to corresponding to mark loading Logical profile, obtain the data of additional logic configuration;The data source is message-oriented middleware or persistent storage medium;Institute Stating message-oriented middleware includes:It is described to hold for caching the Kafka of initial data and Redis for cache logic configuration file Longization storage medium includes:Relational database Mysql and index Solr;The mark is for distinction processing method Field, can be table name, source website address of source database etc.;The logical profile includes processing logic, processing side Method and storage logic content;
Data processing module, for receiving the data of additional logic configuration, and the processing logic in being configured according to logic is moved Processing method corresponding to state calling is (such as:Place corresponding to the data in MongoDB in Student tables is configured with configuration file Reason method is parseStudent (), then module meeting dynamic call this method handles the data), generation result is simultaneously Shunted (such as according to storage logic:Need to insert same Mysql tables, and the field identical data inserted can be sent To same stream);Because original data type and parsing require uncertain, the specific logic that handles needs voluntarily to write, together When different processing logic corresponding to different types of data is specified in logical profile;The storage logic content includes: Database instance, table name, inserted mode and insertion field, former three need to specify.
Aggregation module, for receive shunting result, and by it is identical storage logic result be aggregated to it is same Individual thread, for there are the data of priority flag, memory module is transmitted directly to, otherwise by data buffer storage, is configured until satisfaction Time-out time or quantity retransmit;
Memory module, for receiving polymerization result by batch, and polymerization result is stored and (inserted) according to storage logic Into specified storage medium;The storage medium is relational database Mysql, distribution type file data storage storehouse Mongodb With index Solr.
Large-scale data processing unit of the present embodiment based on Stream Processing framework, is broadly divided into 5 modules, passes through These block coupled in series are formed a complete handling process by Storm stream mechanism, are loaded one on startup and global are matched somebody with somebody Put, module, the Thread Count of modules and the source of initial data of loading are specified in the configuration.The present apparatus has following excellent Point:
(1) data processing speed is fast.The present apparatus is based on Storm, Spark, Samza or Jstorm Stream Processing framework, can To dispose cluster on multiple servers, can speed up processing, make full use of server performance, if desired for increase processing speed Degree only needs to laterally increase server resource.
(2) ensure that data will not lose.Data are read from kafka, it is ensured that data are at least processed once.
(3) expansibility.Data source can be message-oriented middleware, or persistent storage database, can be horizontal Expand, such as storage medium, it is existing to be stored for Mysql and solr, the insertion of the databases such as Mongodb can also be laterally increased.
(4) can dynamic loading processing logic.Newly-increased processing logic only needs in Redis configuration can be by dynamic call Arrive.
(5) can dynamic load topological structure.The bolt of processing data is loaded by way of xml, in different pieces of information processing In can selectively load, reduce server stress.
(6) data aggregate mechanism is used, data is inserted in batches, reduces network overhead.
(7) the whole handling process of data is all determined by configuration file, in different business, it is only necessary to modification configuration text Part is with regard to that can meet process demand.
Embodiment 2
Reference picture 2, large-scale data processing method of the present embodiment based on Stream Processing framework, including procedure below:
Step 1:Topology constructing module builds data processing topology according to XML configuration file on Stream Processing framework, together The connection of Shi Jianli data processings topology and data source, storage medium;
Step 2:Data read module is read with markd initial data from data source, and according to corresponding to mark loading Logical profile, the Data Concurrent for obtaining additional logic configuration give data processing module;The logical profile includes Handle logic, processing method and storage logic content;
Step 3:Data processing module receives the data of additional logic configuration, and the processing logic in being configured according to logic is moved Processing method corresponding to state calling, generate result and simultaneously shunted according to storage logic;
Step 4:Aggregation module receives the result of shunting and it is polymerize, and obtains polymerization result and be sent to deposit Store up module;
Step 5:Memory module receives polymerization result, and is stored polymerization result to the storage specified according to storage logic and be situated between In matter.
Preferably, the data source is message-oriented middleware or persistent storage medium.
Preferably, the message-oriented middleware includes:For caching the Kafka of initial data and configuring text for cache logic The Redis of part, the persistent storage medium include:Relational database Mysql, distribution type file data storage storehouse Mongodb With index Solr.
Preferably, the storage logic includes:Database instance, table name, inserted mode and insertion field.
Pay attention to, the above is only presently preferred embodiments of the present invention.It will be appreciated by those skilled in the art that the invention is not restricted to Specific embodiment described here, it can carry out various significantly changing, readjust and replacing for a person skilled in the art In generation, is without departing from protection scope of the present invention.Therefore, although having been carried out by above example to the present invention more detailed Illustrate, but the present invention is not limited only to above example, without departing from the inventive concept, can also include more Other equivalent embodiments, and the scope of the present invention is determined by scope of the appended claims.

Claims (10)

  1. A kind of 1. large-scale data processing unit based on Stream Processing framework, it is characterised in that including:
    Topology constructing module, for building data processing topology according to XML configuration file, while establish data processing topology and number According to source, the connection of storage medium;
    Data read module, for being read from data source with markd initial data, and the logic according to corresponding to mark loading Configuration file, obtain the data of additional logic configuration;The logical profile includes processing logic, processing method and storage and patrolled Collect content;
    Data processing module, for receiving the data of additional logic configuration, and the processing logic dynamic in being configured according to logic is adjusted With corresponding processing method, generate result and shunted according to storage logic;
    Aggregation module, for receiving the result of shunting and it being polymerize, obtain polymerization result;
    Memory module, polymerization result is stored into specified storage medium for receiving polymerization result, and according to storage logic.
  2. 2. the large-scale data processing unit according to claim 1 based on Stream Processing framework, it is characterised in that described Data source is message-oriented middleware or persistent storage medium.
  3. 3. the large-scale data processing unit according to claim 2 based on Stream Processing framework, it is characterised in that described Message-oriented middleware includes:For caching the Kafka of initial data and Redis for cache logic configuration file, it is described persistently Changing storage medium includes:Relational database Mysql and index Solr.
  4. 4. the large-scale data processing unit according to claim 1 based on Stream Processing framework, it is characterised in that described Storage medium also includes:Mongodb.
  5. 5. the large-scale data processing unit according to claim 1 based on Stream Processing framework, it is characterised in that described Storage logic includes:Database instance, table name, inserted mode and insertion field.
  6. 6. a kind of large-scale data processing method based on Stream Processing framework, it is characterised in that including procedure below:
    Step 1:Topology constructing module according to XML configuration file build data processing topology, while establish data processing topology with The connection of data source, storage medium;
    Step 2:Data read module is read with markd initial data from data source, and the logic according to corresponding to mark loading Configuration file, the Data Concurrent for obtaining additional logic configuration give data processing module;The logical profile includes processing Logic, processing method and storage logic content;
    Step 3:Data processing module receives the data of additional logic configuration, and the processing logic dynamic in being configured according to logic is adjusted With corresponding processing method, generate result and shunted according to storage logic;
    Step 4:Aggregation module receives the result of shunting and it is polymerize, and obtains polymerization result and is sent to storage mould Block;
    Step 5:Memory module receives polymerization result, and is stored polymerization result to the storage medium specified according to storage logic In.
  7. 7. the large-scale data processing method according to claim 6 based on Stream Processing framework, it is characterised in that described Data source is message-oriented middleware or persistent storage medium.
  8. 8. the large-scale data processing method according to claim 6 based on Stream Processing framework, it is characterised in that described Message-oriented middleware includes:For caching the Kafka of initial data and Redis for cache logic configuration file, it is described persistently Changing storage medium includes:Relational database Mysql and index Solr.
  9. 9. the large-scale data processing method according to claim 8 based on Stream Processing framework, it is characterised in that described Storage medium also includes:Mongodb.
  10. 10. the large-scale data processing unit according to claim 6 based on Stream Processing framework, it is characterised in that institute Stating storage logic includes:Database instance, table name, inserted mode and insertion field.
CN201710835168.8A 2017-06-26 2017-09-15 A kind of large-scale data processing unit and method based on Stream Processing framework Pending CN107506482A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2017104911873 2017-06-26
CN201710491187.3A CN107229747A (en) 2017-06-26 2017-06-26 A kind of large-scale data processing unit and method based on Stream Processing framework

Publications (1)

Publication Number Publication Date
CN107506482A true CN107506482A (en) 2017-12-22

Family

ID=59935281

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201710491187.3A Pending CN107229747A (en) 2017-06-26 2017-06-26 A kind of large-scale data processing unit and method based on Stream Processing framework
CN201710835168.8A Pending CN107506482A (en) 2017-06-26 2017-09-15 A kind of large-scale data processing unit and method based on Stream Processing framework

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201710491187.3A Pending CN107229747A (en) 2017-06-26 2017-06-26 A kind of large-scale data processing unit and method based on Stream Processing framework

Country Status (1)

Country Link
CN (2) CN107229747A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109857524A (en) * 2019-01-25 2019-06-07 深圳前海微众银行股份有限公司 Streaming computing method, apparatus, equipment and computer readable storage medium

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107678852B (en) * 2017-10-26 2021-06-22 携程旅游网络技术(上海)有限公司 Method, system, equipment and storage medium based on stream data real-time calculation
CN110007967B (en) * 2017-12-29 2022-05-06 杭州海康威视数字技术股份有限公司 Data processing method, device and equipment based on streaming framework
CN108334380A (en) * 2018-01-19 2018-07-27 新智云数据服务有限公司 A kind of configuration item management method, device, terminal and computer readable storage medium
CN108958789B (en) * 2018-05-20 2021-07-09 湖北九州云仓科技发展有限公司 Parallel stream type computing method, electronic equipment, storage medium and system
CN109145023B (en) * 2018-08-30 2020-11-27 北京百度网讯科技有限公司 Method and apparatus for processing data
CN112181522A (en) * 2020-09-28 2021-01-05 亚信科技(中国)有限公司 Data processing method and device and electronic equipment
CN113438124B (en) * 2021-06-07 2022-05-06 清华大学 Network measurement method and device based on intention driving

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050261A (en) * 2014-06-16 2014-09-17 深圳先进技术研究院 Stormed-based variable logic general data processing system and method
CN105512162A (en) * 2015-09-28 2016-04-20 杭州圆橙科技有限公司 Real-time intelligent processing framework based on storm streaming data
CN105574082A (en) * 2015-12-08 2016-05-11 曙光信息产业(北京)有限公司 Storm based stream processing method and system
CN105959151A (en) * 2016-06-22 2016-09-21 中国工商银行股份有限公司 High availability stream processing system and method
CN106599120A (en) * 2016-12-01 2017-04-26 中国联合网络通信集团有限公司 Stream processing framework-based data processing method and apparatus
CN106789377A (en) * 2017-03-24 2017-05-31 聚好看科技股份有限公司 The service parameter update method of network element cluster

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050261A (en) * 2014-06-16 2014-09-17 深圳先进技术研究院 Stormed-based variable logic general data processing system and method
CN105512162A (en) * 2015-09-28 2016-04-20 杭州圆橙科技有限公司 Real-time intelligent processing framework based on storm streaming data
CN105574082A (en) * 2015-12-08 2016-05-11 曙光信息产业(北京)有限公司 Storm based stream processing method and system
CN105959151A (en) * 2016-06-22 2016-09-21 中国工商银行股份有限公司 High availability stream processing system and method
CN106599120A (en) * 2016-12-01 2017-04-26 中国联合网络通信集团有限公司 Stream processing framework-based data processing method and apparatus
CN106789377A (en) * 2017-03-24 2017-05-31 聚好看科技股份有限公司 The service parameter update method of network element cluster

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109857524A (en) * 2019-01-25 2019-06-07 深圳前海微众银行股份有限公司 Streaming computing method, apparatus, equipment and computer readable storage medium
CN109857524B (en) * 2019-01-25 2024-02-27 深圳前海微众银行股份有限公司 Stream computing method, device, equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN107229747A (en) 2017-10-03

Similar Documents

Publication Publication Date Title
CN107506482A (en) A kind of large-scale data processing unit and method based on Stream Processing framework
CN112860695B (en) Monitoring data query method, device, equipment, storage medium and program product
CN102930062B (en) The method of the quick horizontal extension of a kind of database
US20140156586A1 (en) Big-fast data connector between in-memory database system and data warehouse system
CN106250226B (en) Method for scheduling task and system based on consistency hash algorithm
JP7313382B2 (en) Frequent Pattern Analysis of Distributed Systems
CN107436813A (en) A kind of method and system of meta data server dynamic load leveling
CN110674152B (en) Data synchronization method and device, storage medium and electronic equipment
WO2017028394A1 (en) Example-based distributed data recovery method and apparatus
TWI686703B (en) Method and device for data storage and business processing
CN105550246A (en) System and method for loading network picture under Android platform
CN107665246A (en) Dynamic date migration method and chart database cluster based on chart database
CN111723161A (en) Data processing method, device and equipment
CN107562804A (en) Data buffer service system and method, terminal
US20080208920A1 (en) Efficient detection of deleted objects against a stateless content directory service
CN105872082B (en) Fine granularity resource response system based on container cluster load-balancing algorithm
CN107203437A (en) The methods, devices and systems for preventing internal storage data from losing
US20230342062A1 (en) Live data migration in document stores
CN106909460B (en) Data buffering method, device and storage medium
CN105930104B (en) Date storage method and device
CN110019085A (en) A kind of distributed time series database based on HBase
CN109981726A (en) A kind of distribution method of memory node, server and system
CN113010373B (en) Data monitoring method and device, electronic equipment and storage medium
JP7375173B2 (en) Data processing method and related equipment, and computer program
CN105389368A (en) Method for managing metadata of database cluster of MPP architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20171222

RJ01 Rejection of invention patent application after publication