CN110647575B - Distributed heterogeneous processing framework construction method and system - Google Patents

Distributed heterogeneous processing framework construction method and system Download PDF

Info

Publication number
CN110647575B
CN110647575B CN201810590137.5A CN201810590137A CN110647575B CN 110647575 B CN110647575 B CN 110647575B CN 201810590137 A CN201810590137 A CN 201810590137A CN 110647575 B CN110647575 B CN 110647575B
Authority
CN
China
Prior art keywords
module
message
data
service scheduling
intermediate file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810590137.5A
Other languages
Chinese (zh)
Other versions
CN110647575A (en
Inventor
曹亮
陈冲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu University of Information Technology
Original Assignee
Chengdu University of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu University of Information Technology filed Critical Chengdu University of Information Technology
Priority to CN201810590137.5A priority Critical patent/CN110647575B/en
Publication of CN110647575A publication Critical patent/CN110647575A/en
Application granted granted Critical
Publication of CN110647575B publication Critical patent/CN110647575B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Abstract

The invention discloses a distributed heterogeneous processing framework construction method, which comprises the following steps: the publishing module sends request data and blocks the current thread by subscribing the first message; the service scheduling module receives the request data, packages the request data into an intermediate file packet, writes the intermediate file packet into the cache module and blocks the current thread by subscribing a second message; the calculation engine module calls the intermediate file package, loads corresponding data according to the configuration data, processes the data, and stores the processed final file package to the cache module; the calculation engine module issues a second message to the service scheduling module; the service scheduling module receives the second message, identifies a second ending mark and cancels the subscription of the second message; the service scheduling module issues a first message to the issuing module; the publishing module receives the first message, identifies a first ending mark and cancels the subscription of the first message; the issuing module reads the final file packet from the caching module to obtain a data processing result; the invention improves the real-time processing efficiency on the data line and the storage capacity of the message.

Description

Distributed heterogeneous processing framework construction method and system
Technical Field
The invention relates to the technical field of data processing, in particular to a distributed heterogeneous processing framework construction method and a distributed heterogeneous processing framework construction system.
Background
With the continuous development and growth of the internet industry, the data volume to be processed and stored by a website every day increases geometrically, the scale of the website is also continuously enlarged, the traditional vertical application architecture is more and more bloated, difficult to maintain, high in delay and low in throughput, the requirement of a bottom architecture in the era of big data information is difficult to meet, the adoption of a distributed architecture for data storage and calculation is a necessary trend in the development of the internet era, and meanwhile, the architecture with high-efficiency real-time processing efficiency on a data line and the storage capacity of messages is needed to fill up the current market technical requirements.
Disclosure of Invention
In order to solve the problems, the invention provides a distributed heterogeneous processing framework-based construction method and a distributed heterogeneous processing framework-based construction system.
Specifically, the construction method based on the distributed heterogeneous processing framework comprises the following steps:
s1, sending request data to a service scheduling module through a publishing module, wherein the request data comprises a first message;
s2, the publishing module blocks the current thread of the publishing module by subscribing the first message;
s3, receiving the request data through a service scheduling module, packaging the request data into an intermediate file packet, and writing the intermediate file packet into a cache module, wherein the intermediate file packet comprises a second message and configuration data;
s4, the service scheduling module blocks the current thread of the service scheduling module by subscribing the second message;
s5, calling the intermediate file packet through a calculation engine module, loading and processing corresponding data according to configuration data in the intermediate file packet through the calculation engine module, and storing a processed final file packet to a cache module;
s6, the calculation engine module issues the second message and a second end mark to the service scheduling module;
s7, a service scheduling module receives the second message, identifies the second ending mark and cancels the subscription of the second message;
s8, the service scheduling module issues the first message and the first end mark to a publishing module;
s9, a publishing module receives the first message, identifies the first end mark and cancels the subscription of the first message;
and S10, the issuing module reads the final file packet from the cache module to obtain a data processing result.
Further, the specific implementation steps of step S1 are:
s11, converting the data to be sent comprising the first message into a json character string format and packaging the json character string format into the request data;
and S12, the issuing module sends the request data to the service scheduling module in an asynchronous mode.
Further, the specific implementation steps of step S3 are:
s31, the service scheduling module converts the received request data into a Map format;
s32, extracting configuration data in the request data in the Map format, and packaging the configuration data and the second message into an intermediate file packet in a json format;
and S33, writing the intermediate file packet into a cache module.
Further, the step S4 is implemented by the following steps:
s41, the service scheduling module starts a thread and calls the calculation engine module by calling a processing function;
and S42, subscribing the second message to block the current thread by the service scheduling module.
Specifically, the distributed heterogeneous processing framework system is characterized by comprising a publishing module, a service scheduling module, a calculation engine module, a cache module and a data source module, wherein the publishing module is respectively connected with the cache module and the service scheduling module;
the release module is used for sending request data to the service scheduling module and realizing thread blocking control by subscribing a first message;
the service scheduling module is used for receiving the request data, extracting the configuration file in the request data, packaging the configuration file and the second message into an intermediate file packet, storing the intermediate file packet into the cache module, realizing thread blocking control by subscribing the second message, calling the calculation engine module by calling a processing function and publishing a first subscription message to the publishing module;
the computing engine module is used for calling an intermediate file package, loading corresponding data from the data source module according to configuration data of an intermediate file, processing the loaded data, storing the processed data into the cache module and issuing a second subscription message to the service scheduling module;
the cache module is used for storing the intermediate file packet and the final file packet;
the data source module is used for providing associated data and files required by data processing.
Further, the publishing module is an application client.
Further, the service scheduling module is a Netty server.
Further, the calculation engine module is a Spark cluster.
Further, the cache module is a redis database.
Further, the data source module comprises a relational database, a non-relational database and a file system.
The invention has the beneficial effects that: based on the architecture design concept of distributed heterogeneous data processing, the construction of a system framework is completed through a distributed computing engine module, a service scheduling module, a publishing module and a cache module, so that distributed computing and storage of data are realized, and the real-time processing efficiency on a data line and the storage capacity of messages are effectively improved.
Drawings
FIG. 1 is a method flow diagram of a distributed heterogeneous processing framework based construction method of the present invention;
fig. 2 is a schematic structural diagram of the distributed heterogeneous processing framework-based system of the present invention.
Detailed Description
In order to more clearly understand the technical features, objects, and effects of the present invention, embodiments of the present invention will now be described with reference to the accompanying drawings.
As shown in fig. 1, the distributed heterogeneous processing framework-based building method, which depends on the distributed heterogeneous processing framework-based system, can also be implemented independently, and includes the following steps:
s1, starting a Netty server terminal in the Linux environment, monitoring a designated port, and waiting for Socket connection;
the Web application sends data to the Netty server;
sending request data to a service scheduling module through a publishing module, wherein the request data comprises a first message, and the method specifically comprises the following steps:
s11, formatting data to be transmitted, including the first message, into a json character string form through a json class library GSON of JAVA, and packaging the data into request data;
s12, the Netty client is started and request data are sent to the Netty server in an asynchronous mode.
S2, the Netty client subscribes the current thread of the first message blocking publishing module by adopting Jedis through a publishing/subscribing mode provided by redis, and the first message is packaged in json format request data in the step S11;
s3, receiving the request data through the Netty server, packaging the request data into an intermediate file packet, writing the intermediate file packet into a redis, wherein the intermediate file packet comprises a first message, a second message and configuration data, and the specific process comprises the following steps:
s31, the Netty server converts the received json format request data into a Map format by using GSON, and encapsulates the second message in the Map format data;
s32, extracting parameters required by Spark cluster operation processing in the Map-formatted data in the step S31, wherein the parameters comprise configuration data in the request data, and packaging the required parameters and the second message into a json-formatted intermediate file packet;
and S33, writing the intermediate file packet into the redis.
S4, the Netty server side subscribes the second message by using a publish/subscribe mode provided by redis to block the current thread of the Netty server, and the specific process is as follows:
s41, the Netty server starts a thread and calls a formatted Linux command in a Spark-submit Spark gap jar name, wherein the Spark gap jar name is read from Map data in the step S31 to call a Spark cluster;
and S42, the Netty server side blocks the current thread by subscribing the second message by Jedis according to the publish/subscribe mode provided by redis.
S5, the Spark cluster receives the call of the command in the step S41, reads the intermediate file package from the redis by using Jedis according to the spark.jar file package, loads corresponding data from the data source module by using Spark SQL according to the configuration data in the intermediate file package, filters and processes the loaded data by using Spark RDD, stores the processed final file data package in DateSet < Row > format to the redis by using Pipeline of Jedis, and the data source module comprises a relational database, a non-relational database and a file system;
s6, the Spark cluster acquires a second message subscribed by the Netty server end through Spark and jar files, and issues the second message in a json form and a second ending mark to the Netty server by adopting Jedis in a publishing/subscribing mode provided by redis;
s7, the Netty server receives the second message and judges whether the message is finished by reading flag = true/false, when the Netty server receives the flag = true identifier, the Netty server cancels the subscription of the second message, and the current thread blockage of the Netty server is finished;
s8, the Netty server acquires the first message encapsulated in the intermediate file packet, and issues the first message and the first end mark to the Netty client by adopting Jedis in a publish/subscribe mode provided by redis;
s9, the Netty client receives the first message and judges whether the message is ended by reading flag = true/false, and when the Netty client receives the flag = true identifier, the Netty client cancels the subscription of the first message and ends the current thread blocking of the Netty client;
and S10, the Netty client selectively uses the concurrent or blocked queue or Pipeline according to the data quantity to read the final file packet processed by the Spark cluster from the redis to obtain the data processing result.
As shown in fig. 2, the distributed heterogeneous processing framework system includes a publishing module, a service scheduling module, a calculation engine module, a cache module, and a data source module, where the publishing module is connected to the cache module and the service scheduling module respectively, the service scheduling module is connected to the cache module and the calculation engine module respectively, the cache module is connected to the calculation engine module, and the calculation engine module is connected to the data source module;
the release module is used for sending request data to the service scheduling module and realizing thread blocking control by subscribing a first message;
the service scheduling module is used for receiving the request data, extracting the configuration file in the request data, packaging the configuration file and the second message into an intermediate file packet, storing the intermediate file packet into the cache module, realizing thread blocking control by subscribing the second message, calling the calculation engine module by calling the processing function and publishing the first subscription message to the publishing module;
the computing engine module is used for calling the intermediate file package, loading corresponding data from the data source module according to the configuration data of the intermediate file, processing the loaded data, storing the processed data into the cache module and issuing a second subscription message to the service scheduling module;
the cache module is used for storing the intermediate file package and the final file package;
the data source module is used for providing associated data and files required by data processing.
Further, the publishing module is an application client.
Further, the service scheduling module is a Netty server.
Further, the calculation engine module is a Spark cluster.
Further, the cache module is a redis database.
Further, the data source module comprises a relational database, a non-relational database and a file system.
It should be noted that, for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the order of acts described, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and elements referred to are not necessarily required in this application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a ROM, a RAM, etc.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims (7)

1. The distributed heterogeneous processing framework-based construction method is characterized by comprising the following steps of:
s1, sending request data to a service scheduling module through a publishing module, wherein the request data comprises a first message;
s2, the publishing module blocks the current thread of the publishing module by subscribing the first message;
s3, receiving the request data through a service scheduling module, packaging the request data into an intermediate file packet, and writing the intermediate file packet into a cache module, wherein the intermediate file packet comprises a second message and configuration data;
s4, the service scheduling module blocks the current thread of the service scheduling module by subscribing the second message;
s5, calling the intermediate file packet through a calculation engine module, loading and processing corresponding data according to configuration data in the intermediate file packet through the calculation engine module, and storing a processed final file packet to a cache module;
s6, the calculation engine module issues the second message and a second end mark to the service scheduling module;
s7, a service scheduling module receives the second message, identifies the second ending mark and cancels the subscription of the second message;
s8, the service scheduling module issues the first message and the first end mark to a publishing module;
s9, a publishing module receives the first message, identifies the first end mark and cancels the subscription of the first message;
s10, the issuing module reads the final file packet from the cache module to obtain a data processing result;
the specific implementation steps of step S1 are as follows:
s11, converting the data to be sent comprising the first message into a json character string format and packaging the json character string format into the request data;
s12, the issuing module sends the request data to a service scheduling module in an asynchronous mode;
the specific implementation steps of step S3 are as follows:
s31, the service scheduling module converts the received request data into a Map format;
s32, extracting configuration data in the request data in the Map format, and packaging the configuration data and the second message into an intermediate file packet in a json format;
s33, writing the intermediate file packet into a cache module;
the step S4 is specifically implemented as follows:
s41, the service scheduling module starts a thread and calls the calculation engine module by calling a processing function;
and S42, subscribing the second message to block the current thread by the service scheduling module.
2. The distributed heterogeneous processing framework system is characterized by comprising a publishing module, a service scheduling module, a calculation engine module, a cache module and a data source module, wherein the publishing module is respectively connected with the cache module and the service scheduling module;
the release module is used for sending request data to the service scheduling module and realizing thread blocking control by subscribing a first message;
the service scheduling module is used for receiving the request data, extracting the configuration file in the request data, packaging the configuration file and the second message into an intermediate file packet, storing the intermediate file packet into the cache module, realizing thread blocking control by subscribing the second message, calling the calculation engine module by calling a processing function and publishing a first subscription message to the publishing module;
the computing engine module is used for calling an intermediate file package, loading corresponding data from the data source module according to configuration data of an intermediate file, processing the loaded data, storing the processed data into the cache module and issuing a second subscription message to the service scheduling module;
the cache module is used for storing the intermediate file packet and the final file packet;
the data source module is used for providing associated data and files required by data processing.
3. The distributed heterogeneous processing framework based system of claim 2 wherein the publishing module is an application client.
4. The distributed heterogeneous processing framework based system of claim 2 wherein the service scheduling module is a Netty server.
5. The distributed heterogeneous processing framework based system of claim 2 wherein the compute engine module is a Spark cluster.
6. The distributed heterogeneous processing framework based system of claim 2, wherein the caching module is a redis database.
7. The distributed heterogeneous processing framework based system of claim 2 wherein the data source modules comprise relational databases, non-relational databases, and file systems.
CN201810590137.5A 2018-06-08 2018-06-08 Distributed heterogeneous processing framework construction method and system Active CN110647575B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810590137.5A CN110647575B (en) 2018-06-08 2018-06-08 Distributed heterogeneous processing framework construction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810590137.5A CN110647575B (en) 2018-06-08 2018-06-08 Distributed heterogeneous processing framework construction method and system

Publications (2)

Publication Number Publication Date
CN110647575A CN110647575A (en) 2020-01-03
CN110647575B true CN110647575B (en) 2022-03-11

Family

ID=69008589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810590137.5A Active CN110647575B (en) 2018-06-08 2018-06-08 Distributed heterogeneous processing framework construction method and system

Country Status (1)

Country Link
CN (1) CN110647575B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112862526B (en) * 2021-02-04 2024-01-12 深圳迅策科技有限公司 Real-time valuation method, device and readable medium for big data financial assets
CN113064920A (en) * 2021-02-26 2021-07-02 苏宁金融科技(南京)有限公司 Real-time computing method and device based on Flink, computer equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104811459A (en) * 2014-01-23 2015-07-29 阿里巴巴集团控股有限公司 Processing method, processing device and system for message services and message service system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10701148B2 (en) * 2012-12-13 2020-06-30 Level 3 Communications, Llc Content delivery framework having storage services
US9705754B2 (en) * 2012-12-13 2017-07-11 Level 3 Communications, Llc Devices and methods supporting content delivery with rendezvous services
CN105472042B (en) * 2016-01-15 2018-09-21 中煤电气有限公司 The message-oriented middleware system and its data transferring method of WEB terminal control
CN107229639B (en) * 2016-03-24 2020-07-28 上海宝信软件股份有限公司 Storage system of distributed real-time database
EP3361700B1 (en) * 2016-05-11 2021-08-04 Oracle International Corporation Multi-tenant identity and data security management cloud service
CN106777029A (en) * 2016-12-08 2017-05-31 中国科学技术大学 A kind of distributed rule automotive engine system and its construction method
CN106648816B (en) * 2016-12-09 2020-03-17 武汉斗鱼网络科技有限公司 Multithreading system and method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104811459A (en) * 2014-01-23 2015-07-29 阿里巴巴集团控股有限公司 Processing method, processing device and system for message services and message service system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
kafka:a distributed messaging system for log processing;Jay Kreps等;《proceedings of the NetDB》;20111130;第11卷;1-7 *
基于发布订阅机制的实时中间件的设计与实现;郑鹏怡等;《计算机应用与软件》;20180215;第35卷(第2期);44-47+53 *
面向多应用多租户的消息数据订阅关键技术研究;付戈等;《信息网络安全》;20171110(第11期);44-49 *

Also Published As

Publication number Publication date
CN110647575A (en) 2020-01-03

Similar Documents

Publication Publication Date Title
CN112507029B (en) Data processing system and data real-time processing method
CN108897854B (en) Monitoring method and device for overtime task
CN110647575B (en) Distributed heterogeneous processing framework construction method and system
CN104391957A (en) Data interaction analysis method for hybrid big data processing system
CN102467412A (en) Method, device and business system for processing operation request
CN112035563A (en) Real-time database system based on shared storage
CN112559476A (en) Log storage method for improving performance of target system and related equipment thereof
CN116189330A (en) Processing method, storage medium and processor for working condition data of engineering vehicle
US8510426B2 (en) Communication and coordination between web services in a cloud-based computing environment
WO2023082681A1 (en) Data processing method and apparatus based on batch-stream integration, computer device, and medium
CN109683875B (en) Application framework system of MVC (model view controller) pattern in distributed environment and method thereof
CN110245043B (en) Tracking system for call relation between distributed systems
CN111709696A (en) Method and device for generating mail list based on SOA (service oriented architecture)
CN116069462A (en) Big data DAG task flow scheduling method, system and storage medium
CN112860412B (en) Service data processing method and device, electronic equipment and storage medium
CN114090409A (en) Message processing method and device
CN111401819B (en) Intersystem data pushing method and system
CN103514275A (en) User space event filtering-based method for increasing network program processing speed
CN113760986A (en) Data query method, device, equipment and storage medium
CN112685047A (en) Rapid analysis system based on large file
CN111159605A (en) Method for solving problem of slow page loading after intercepting and forwarding request of Android Webview
CN117453790A (en) Data exchange method and device based on cloud object storage, equipment and storage medium
CN117081961A (en) Method, system, device and server for monitoring quantity of accumulated messages
CN117435367A (en) User behavior processing method, device, equipment, storage medium and program product
CN114201291A (en) Robot and cloud communication method and hardware architecture system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant