CN109684377A - General big data handles development platform and its data processing method in real time - Google Patents

General big data handles development platform and its data processing method in real time Download PDF

Info

Publication number
CN109684377A
CN109684377A CN201811528297.3A CN201811528297A CN109684377A CN 109684377 A CN109684377 A CN 109684377A CN 201811528297 A CN201811528297 A CN 201811528297A CN 109684377 A CN109684377 A CN 109684377A
Authority
CN
China
Prior art keywords
data
module
real time
development platform
batch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811528297.3A
Other languages
Chinese (zh)
Inventor
赵晓炳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Di Di Information Technology Ltd By Share Ltd
Original Assignee
Shenzhen Di Di Information Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Di Di Information Technology Ltd By Share Ltd filed Critical Shenzhen Di Di Information Technology Ltd By Share Ltd
Priority to CN201811528297.3A priority Critical patent/CN109684377A/en
Publication of CN109684377A publication Critical patent/CN109684377A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a kind of general big datas to handle development platform and its data processing method in real time, which includes: data acquisition module, for obtaining multiple heterogeneous data sources from database;Data transmission module, for issuing multiple heterogeneous data sources;Data processing module, for respectively in multiple heterogeneous data sources batch data and flow data handled and called corresponding presetting database, to construct service application;Memory module, for being stored to processed batch data and flow data;Export enquiry module, for storage batch data and flow data inquire;Application program coordination service module, to monitor the processing event of each module, to call corresponding service to handle corresponding data when event anomalies are managed in certain.Technical solution of the present invention can be improved the development efficiency of data processing system, reduce development cost.

Description

General big data handles development platform and its data processing method in real time
Technical field
The present invention relates to technical field of data processing more particularly to a kind of general big data handle in real time development platform and its Data processing method.
Background technique
Currently, having many different but essence and similar solution in data processing field, these schemes are to data Process flow generally comprise: data acquisition, data transmission, data processing, data store this four steps, and each step can When design data processing is system, above-mentioned separate modular can be carried out grinding or using certainly as a separate modular Component of increasing income carries out permutation and combination.It can't do without the processing part to data in own service for medium and small scientific & technical corporation, such as Fruit uses from grinding route, and one business of every realization may require these data processing systems of overlapping development from the beginning to the end, this its In workload occupy a greater part of project development time, and also need to pacify due to having coupled business characteristic after completion system Row special messenger safeguards that the resource to company is a kind of serious waste to different business systems.If using open source or mixed Conjunction scheme needs to carry out Technology Selection for different business in numerous open source components, and there are certain examinations in Technology Selection Difficulty.
In view of this, it is necessary to which current data processing system is further improved in proposition.
Summary of the invention
To solve an above-mentioned at least technical problem, the main object of the present invention is to provide a kind of general big data and handles in real time Development platform and its data processing method.
To achieve the above object, a technical solution adopted by the present invention are as follows: a kind of general big data is provided and is handled in real time Development platform, comprising:
Data acquisition module, for obtaining multiple heterogeneous data sources from database;
Data transmission module, the data transmission module are electrically connected with data acquisition module, are used for multiple isomeric datas It is issued in source;
Data processing module, the data processing module are electrically connected with data transmission module, for respectively to multiple isomeries Batch data and flow data in data source are handled and are called corresponding presetting database, to construct service application, wherein The data processing module is preset with application database;
Memory module, the memory module are electrically connected with data processing module, for processed batch data and stream Data are stored;
Export enquiry module, the output enquiry module is electrically connected with memory module, for storage batch data and Flow data is inquired;
Application program coordination service module, the application program coordination service module respectively with data transmission module, data Processing module, memory module and output enquiry module electrical connection, to monitor the processing event of each module, in certain director When part exception, corresponding service is called to handle corresponding data.
Wherein, the data type of the data collecting module collected include MySQL, Oracle, HDFS, Hive, At least one of OceanBase, HBase, OTS and ODPS, and the data acquisition module is specially Ali's open source DataX.
Wherein, the data transmission module uses Kafka cluster, and multiple heterogeneous data sources are carried out distributed post.
Wherein, the data processing module is specially general real-time computing engines Spark cluster, multiple is answered parallel with constructing With and call corresponding application database, and respectively in multiple heterogeneous data sources batch data and flow data located parallel Reason.
Wherein, the application database includes SQL, DataFrames, MLlib, GraphX and Spark cluster At least one of Streaming.
Wherein, the data processing module is specially column memory Kudu cluster, to the lot number through parallel processing According to and flow data stored respectively.
Wherein, the output enquiry module is specially distributed query engine Impala cluster, with the lot number to storage According to and flow data concurrently inquired.
Wherein, the application program coordination service module is specially distributed application program coordination service Zookeeper.
To achieve the above object, another technical solution used in the present invention are as follows: a kind of general big data is provided and is located in real time Manage the data processing method of development platform, comprising:
S10, multiple heterogeneous data sources are obtained from database;
S20, multiple heterogeneous data sources are issued;
S30, respectively in multiple heterogeneous data sources batch data and flow data handled and called corresponding default Database, to construct service application;
S40, processed batch data and flow data are stored;
S50, the batch data and flow data of storage are inquired;
The processing event that heterogeneous data source is handled in S60, monitoring step S20-S50, in certain director's part When abnormal, the corresponding data of corresponding service processing are called.
Technical solution of the present invention mainly includes data acquisition module, data transmission module, data processing module, storage mould Block, output enquiry module and the general big data of application program service module composition handle development platform in real time, pass through above-mentioned mould The cooperation of block cooperates, and provides general distributed real-time calculation and analysis ability, while also providing a series of High Availabitities, Gao Xing The infrastructure of energy, high scalability provides a kind of general development platform for different business, reduces inefficient, repetition General character component exploitation, be conducive to improve data processing system development efficiency, reduce development cost.
Detailed description of the invention
Fig. 1 is the block diagram that the general big data of one embodiment of the invention handles development platform in real time;
Fig. 2 is that the general big data of the present invention handles development platform running environment figure in real time;
Fig. 3 is the method flow for the data processing method that the general big data of one embodiment of the invention handles development platform in real time Figure.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiment is only a part of the embodiments of the present invention, instead of all the embodiments.Base Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts it is all its His embodiment, shall fall within the protection scope of the present invention.
It is to be appreciated that the description of " first ", " second " involved in the present invention etc. is used for description purposes only, and should not be understood as Its relative importance of indication or suggestion or the quantity for implicitly indicating indicated technical characteristic.Define as a result, " first ", The feature of " second " can explicitly or implicitly include at least one of the features.In addition, the technical side between each embodiment Case can be combined with each other, but must be based on can be realized by those of ordinary skill in the art, when the combination of technical solution Conflicting or cannot achieve when occur will be understood that the combination of this technical solution is not present, also not the present invention claims guarantor Within the scope of shield.
Fig. 1 is please referred to, Fig. 1 is the block diagram that the general big data of one embodiment of the invention handles development platform in real time. In embodiments of the present invention, which handles development platform in real time, comprising:
Data acquisition module 10, for obtaining multiple heterogeneous data sources from database;
Data transmission module 20, the data transmission module 20 are electrically connected with data acquisition module 10, and being used for will be multiple different Structure data source is issued;
Data processing module 30, the data processing module 30 are electrically connected with data transmission module 20, for respectively to more Batch data and flow data in a heterogeneous data source are handled and are called corresponding presetting database, are answered with constructing business With, wherein the data processing module 30 is preset with application database;
Memory module 40, the memory module 40 are electrically connected with data processing module 30, for processed lot number According to and flow data stored;
Enquiry module 50 is exported, the output enquiry module 50 is electrically connected with memory module 40, for the batch to storage Data and flow data are inquired;
Application program coordination service module 60, the application program coordination service module 60 respectively with data transmission module 20, data processing module 30, memory module 40 and output enquiry module 50 are electrically connected, to monitor the processing event of each module, To call corresponding service to handle corresponding data when event anomalies are managed in certain.
In the present embodiment, which can acquire multiple heterogeneous data sources in disparate databases, support Any isomeric data system off-line data interaction.Data transmission module 20 can be distributed with multiple heterogeneous data sources, with transmission To data processing module 30.Data processing module 30 can be handled batch data in multiple heterogeneous data sources and flow data And by presetting database corresponding with the interaction calling of user, to construct the service application needed;Memory module 40, can be with It is stored to through batch data and flow data;Output enquiry module 50 with to storage batch data and flow data look into It askes;Application program coordination service module 60, can be with Coordination Treatment data transmission module 20, data processing module 30, memory module 40 and output enquiry module 50 call corresponding service to handle corresponding number when event anomalies are managed in certain of each module According to guarantee the High Availabitity and Consistency service of platform.
Technical solution of the present invention mainly include data acquisition module 10, data transmission module 20, data processing module 30, Memory module 40, output enquiry module 50 and the general big data of application program service module composition handle development platform in real time, It is cooperated by the cooperation of above-mentioned module, provides general distributed real-time calculation and analysis ability, while also providing a series of High Availabitity, high-performance, the infrastructure of high scalability provide a kind of general development platform for different business, reduce The exploitation of inefficient, duplicate general character component is conducive to the development efficiency for improving data processing system, reduces development cost.
In a specific embodiment, the data type that the data acquisition module 10 acquires include MySQL, At least one of Oracle, HDFS, Hive, OceanBase, HBase, OTS and ODPS, and the data acquisition module 10 is specific For Ali's open source DataX.In the present embodiment, initial data to be treated can store in MySQL, Oracle, HDFS, In at least one of Hive, OceanBase, HBase, OTS and ODPS database, data are carried out by Ali's open source DataX and are adopted Collection may be implemented efficient data between various heterogeneous data sources and synchronize, and DataX uses frame+plug-in unit mode, Ke Yifang Just the difference of reply different data sources also has preferable extended capability.
In a specific embodiment, the data transmission module 20 uses Kafka cluster, by multiple isomeric datas Source carries out distributed post.Kafka cluster is distributed stream processing platform, has the distribution subscription ability similar message to message Queue or enterprise-level message system have persistence fault-tolerant ability and real-time message processing ability to message, will be more A heterogeneous data source carries out distributed post.
In a specific embodiment, the data processing module 30 is specially general real-time computing engines Spark collection Group, to construct multiple Parallel applications and call corresponding application database, and respectively to the lot number in multiple heterogeneous data sources According to and flow data carry out parallel processing.In the present embodiment, general real-time computing engines Spark cluster is built-in with directed acyclic graph tune Device, query optimizer and physics enforcement engine are spent, the high-performance treatments to batch data and flow data may be implemented, additionally provide More than 80 kinds high level operations symbol allow to by its convenient building Parallel application can also by Scala, Python, R, SQL Shells very easily interacts operation.In addition, Spark cluster be equipped with application database, specifically include SQL, At least one of DataFrames, MLlib, GraphX and Spark cluster Streaming, to facilitate the use of developer.
In a specific embodiment, the data processing module 30 is specially column memory Kudu cluster, with right Batch data and flow data through parallel processing are stored respectively.The Kudu cluster is capable of providing quickly insertion, updates behaviour Work and efficient column scan, aim at the scene analyzed in the data of frequent updating and design, significantly reduce Spark The query latency of cluster and subsequent Impala cluster.
In a specific embodiment, the output enquiry module 50 is specially distributed query engine Impala collection Group, with to storage batch data and flow data concurrently inquired.In the present embodiment, Impala cluster provides low latency height Concurrent search efficiency has preferable linear extendible ability in concurrent environment.Further, the application program association Adjusting service module 60 is specially distributed application program coordination service Zookeeper.In the present embodiment, Zookeeper can be protected Demonstrate,prove the high availability and consistency of distributed system, additionally it is possible to provide configuring maintenance, domain name service, distributed synchronization, group service Etc. functions.
Referring to figure 2., Fig. 2 is that the general big data of the present invention handles development platform running environment figure in real time.General big data The data acquisition module 10 in processing development platform is applied to data collection layer in real time, and data transmission module 20 is passed applied to data Defeated layer, data processing module 30 are applied to data analysis layer, and memory module 40 is applied to data storage layer, export enquiry module 50 Applied to data query layer, source data layer can be convenient the initial data for importing developer's input.
Referring to figure 3., Fig. 3 is the data processing method that the general big data of one embodiment of the invention handles development platform in real time Method flow diagram.In an embodiment of the present invention, which handles the data processing method of development platform in real time, packet It includes:
S10, multiple heterogeneous data sources are obtained from database;
S20, multiple heterogeneous data sources are issued;
S30, respectively in multiple heterogeneous data sources batch data and flow data handled and called corresponding default Database, to construct service application;
S40, processed batch data and flow data are stored;
S50, the batch data and flow data of storage are inquired;
The processing event that heterogeneous data source is handled in S60, monitoring step S20-S50, in certain director's part When abnormal, corresponding service is called to handle corresponding data.
In the present embodiment, the multiple heterogeneous data sources obtained from database, to obtain original processing to be treated, so Multiple heterogeneous data sources are distributed afterwards, and then are handled simultaneously to batch data in multiple heterogeneous data sources and flow data By presetting database corresponding with the interaction calling of user, to construct the service application needed, then to through batch data And flow data is stored;And the batch data and flow data of storage are inquired;It is also wrapped in above-mentioned steps S20-S50 Monitoring is included and processing event that Coordination Treatment handles heterogeneous data source, specifically, in certain director's part of each module When abnormal, the corresponding data of corresponding service processing are called, to guarantee the High Availabitity and Consistency service of platform.
The above description is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all at this Under the inventive concept of invention, using equivalent structure transformation made by description of the invention and accompanying drawing content, or directly/use indirectly It is included in other related technical areas in scope of patent protection of the invention.

Claims (9)

1. a kind of general big data handles development platform in real time, which is characterized in that it is flat that the general big data handles exploitation in real time Platform includes:
Data acquisition module, for obtaining multiple heterogeneous data sources from database;
Data transmission module, the data transmission module are electrically connected with data acquisition module, for by multiple heterogeneous data sources into Row publication;
Data processing module, the data processing module are electrically connected with data transmission module, for respectively to multiple isomeric datas Batch data and flow data in source are handled and are called corresponding presetting database, to construct service application, wherein described Data processing module is preset with application database;
Memory module, the memory module are electrically connected with data processing module, for processed batch data and flow data It is stored;
Enquiry module is exported, the output enquiry module is electrically connected with memory module, for the batch data and fluxion to storage According to being inquired;
Application program coordination service module, the application program coordination service module respectively with data transmission module, data processing Module, memory module and output enquiry module electrical connection, to monitor the processing event of each module, with different in certain director's part Chang Shi calls corresponding service to handle corresponding data.
2. general big data as described in claim 1 handles development platform in real time, which is characterized in that the data acquisition module The data type of acquisition includes at least one of MySQL, Oracle, HDFS, Hive, OceanBase, HBase, OTS and ODPS, And the data acquisition module is specially Ali's open source DataX.
3. general big data as claimed in claim 2 handles development platform in real time, which is characterized in that the data transmission module Using Kafka cluster, multiple heterogeneous data sources are subjected to distributed post.
4. general big data as claimed in claim 3 handles development platform in real time, which is characterized in that the data processing module Specially general real-time computing engines Spark cluster, to construct multiple Parallel applications and call corresponding application database, and point Other batch data and flow data in multiple heterogeneous data sources carries out parallel processing.
5. general big data as claimed in claim 4 handles development platform in real time, which is characterized in that the application database packet Include at least one of SQL, DataFrames, MLlib, GraphX and Spark cluster Streaming.
6. general big data as claimed in claim 5 handles development platform in real time, which is characterized in that the data processing module Specially column memory Kudu cluster, with to through parallel processing batch data and flow data store respectively.
7. general big data as claimed in claim 6 handles development platform in real time, which is characterized in that the output enquiry module Specially distributed query engine Impala cluster, with to storage batch data and flow data concurrently inquired.
8. general big data as claimed in claim 7 handles development platform in real time, which is characterized in that the application program is coordinated Service module is specially distributed application program coordination service Zookeeper.
9. the data processing method that a kind of general big data handles development platform in real time, which is characterized in that the general big data The data processing method of processing development platform includes: in real time
S10, multiple heterogeneous data sources are obtained from database;
S20, multiple heterogeneous data sources are issued;
S30, respectively in multiple heterogeneous data sources batch data and flow data handled and call corresponding preset data Library, to construct service application;
S40, processed batch data and flow data are stored;
S50, the batch data and flow data of storage are inquired;
The processing event that heterogeneous data source is handled in S60, monitoring step S20-S50, to manage event anomalies in certain When, call the corresponding data of corresponding service processing.
CN201811528297.3A 2018-12-13 2018-12-13 General big data handles development platform and its data processing method in real time Pending CN109684377A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811528297.3A CN109684377A (en) 2018-12-13 2018-12-13 General big data handles development platform and its data processing method in real time

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811528297.3A CN109684377A (en) 2018-12-13 2018-12-13 General big data handles development platform and its data processing method in real time

Publications (1)

Publication Number Publication Date
CN109684377A true CN109684377A (en) 2019-04-26

Family

ID=66187655

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811528297.3A Pending CN109684377A (en) 2018-12-13 2018-12-13 General big data handles development platform and its data processing method in real time

Country Status (1)

Country Link
CN (1) CN109684377A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126963A (en) * 2019-12-24 2020-05-08 微创(上海)网络技术股份有限公司 Dynamic integration method of heterogeneous multi-source data
CN111414363A (en) * 2020-03-13 2020-07-14 上海银赛计算机科技有限公司 parallel heterogeneous method, system, medium and device suitable for client data in MySQL
CN111625218A (en) * 2020-05-14 2020-09-04 中电工业互联网有限公司 Big data processing method and system for custom library development
CN112100265A (en) * 2020-09-17 2020-12-18 博雅正链(北京)科技有限公司 Multi-source data processing method and device for big data architecture and block chain
CN112395365A (en) * 2019-08-14 2021-02-23 北京海致星图科技有限公司 Knowledge graph batch offline query solution
CN115208875A (en) * 2022-07-14 2022-10-18 中国银行股份有限公司 Information integration system of multi-transmission middleware
CN115599524A (en) * 2022-10-27 2023-01-13 中国兵器工业计算机应用技术研究所(Cn) Data lake system based on cooperative scheduling processing of streaming data and batch data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7497370B2 (en) * 2005-01-27 2009-03-03 Microsoft Corporation Supply chain visibility solution architecture
CN104683445A (en) * 2015-01-26 2015-06-03 北京邮电大学 Distributed real-time data fusion system
CN106651633A (en) * 2016-10-09 2017-05-10 国网浙江省电力公司信息通信分公司 Power utilization information acquisition system and method based on big data technology
CN106873945A (en) * 2016-12-29 2017-06-20 中山大学 Data processing architecture and data processing method based on batch processing and Stream Processing
CN107784098A (en) * 2017-10-24 2018-03-09 百味云科技股份有限公司 Real-time data warehouse platform
CN108874982A (en) * 2018-06-11 2018-11-23 华南理工大学 A method of based on the offline real-time processing data of Spark big data frame

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7497370B2 (en) * 2005-01-27 2009-03-03 Microsoft Corporation Supply chain visibility solution architecture
CN104683445A (en) * 2015-01-26 2015-06-03 北京邮电大学 Distributed real-time data fusion system
CN106651633A (en) * 2016-10-09 2017-05-10 国网浙江省电力公司信息通信分公司 Power utilization information acquisition system and method based on big data technology
CN106873945A (en) * 2016-12-29 2017-06-20 中山大学 Data processing architecture and data processing method based on batch processing and Stream Processing
CN107784098A (en) * 2017-10-24 2018-03-09 百味云科技股份有限公司 Real-time data warehouse platform
CN108874982A (en) * 2018-06-11 2018-11-23 华南理工大学 A method of based on the offline real-time processing data of Spark big data frame

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112395365A (en) * 2019-08-14 2021-02-23 北京海致星图科技有限公司 Knowledge graph batch offline query solution
CN111126963A (en) * 2019-12-24 2020-05-08 微创(上海)网络技术股份有限公司 Dynamic integration method of heterogeneous multi-source data
CN111414363A (en) * 2020-03-13 2020-07-14 上海银赛计算机科技有限公司 parallel heterogeneous method, system, medium and device suitable for client data in MySQL
CN111414363B (en) * 2020-03-13 2023-04-14 上海银赛计算机科技有限公司 Parallel heterogeneous method, system, medium and equipment suitable for client data in MySQL
CN111625218A (en) * 2020-05-14 2020-09-04 中电工业互联网有限公司 Big data processing method and system for custom library development
CN111625218B (en) * 2020-05-14 2024-01-09 中电工业互联网有限公司 Big data processing method and system for custom library development
CN112100265A (en) * 2020-09-17 2020-12-18 博雅正链(北京)科技有限公司 Multi-source data processing method and device for big data architecture and block chain
CN115208875A (en) * 2022-07-14 2022-10-18 中国银行股份有限公司 Information integration system of multi-transmission middleware
CN115599524A (en) * 2022-10-27 2023-01-13 中国兵器工业计算机应用技术研究所(Cn) Data lake system based on cooperative scheduling processing of streaming data and batch data

Similar Documents

Publication Publication Date Title
CN109684377A (en) General big data handles development platform and its data processing method in real time
Ding et al. Enabling smart transportation systems: A parallel spatio-temporal database approach
CN107515878B (en) Data index management method and device
CN107070890A (en) Flow data processing device and communication network major clique system in a kind of communication network major clique system
CN109582717B (en) Database unified platform for electric power big data and reading method thereof
CN104850640A (en) HBase based storage and query method and system for power equipment status monitoring data
CN103678609A (en) Large data inquiring method based on distribution relation-object mapping processing
CN106055587A (en) Partitioning database system and routing method thereof
CN103646073A (en) Condition query optimizing method based on HBase table
CN104462222A (en) Distributed storage method and system for checkpoint vehicle pass data
Chattopadhyay et al. Procella: Unifying serving and analytical data at YouTube
CN107193898A (en) The inquiry sharing method and system of log data stream based on stepped multiplexing
CN105405070A (en) Distributed memory power grid system construction method
CN103729448A (en) Method and device for querying data
US20200334314A1 (en) Emergency disposal support system
Padiya et al. DWAHP: workload aware hybrid partitioning and distribution of RDF data
CN104820700A (en) Processing method of unstructured data of transformer substation
CN110990368A (en) Full-link data management system and management method thereof
D’silva et al. Secondary indexing techniques for key-value stores: Two rings to rule them all
CN116775605A (en) Industrial data management and sharing platform based on artificial intelligence
Chen et al. Octopus: Hybrid big data integration engine
CN116700917A (en) Data decision platform and use method
CN102087655A (en) Web site system capable of embodying interpersonal relation net
CN108536758B (en) Data table reconstruction method, device and system for database mode
CN111881086B (en) Big data storage method, query method, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190426

RJ01 Rejection of invention patent application after publication