CN110647575B

CN110647575B - Distributed heterogeneous processing framework construction method and system

Info

Publication number: CN110647575B
Application number: CN201810590137.5A
Authority: CN
Inventors: 曹亮; 陈冲
Original assignee: Chengdu University of Information Technology
Current assignee: Chengdu University of Information Technology
Priority date: 2018-06-08
Filing date: 2018-06-08
Publication date: 2022-03-11
Anticipated expiration: 2038-06-08
Also published as: CN110647575A

Abstract

The invention discloses a distributed heterogeneous processing framework construction method, which comprises the following steps: the publishing module sends request data and blocks the current thread by subscribing the first message; the service scheduling module receives the request data, packages the request data into an intermediate file packet, writes the intermediate file packet into the cache module and blocks the current thread by subscribing a second message; the calculation engine module calls the intermediate file package, loads corresponding data according to the configuration data, processes the data, and stores the processed final file package to the cache module; the calculation engine module issues a second message to the service scheduling module; the service scheduling module receives the second message, identifies a second ending mark and cancels the subscription of the second message; the service scheduling module issues a first message to the issuing module; the publishing module receives the first message, identifies a first ending mark and cancels the subscription of the first message; the issuing module reads the final file packet from the caching module to obtain a data processing result; the invention improves the real-time processing efficiency on the data line and the storage capacity of the message.

Description

Distributed heterogeneous processing framework construction method and system

Technical Field

The invention relates to the technical field of data processing, in particular to a distributed heterogeneous processing framework construction method and a distributed heterogeneous processing framework construction system.

Background

With the continuous development and growth of the internet industry, the data volume to be processed and stored by a website every day increases geometrically, the scale of the website is also continuously enlarged, the traditional vertical application architecture is more and more bloated, difficult to maintain, high in delay and low in throughput, the requirement of a bottom architecture in the era of big data information is difficult to meet, the adoption of a distributed architecture for data storage and calculation is a necessary trend in the development of the internet era, and meanwhile, the architecture with high-efficiency real-time processing efficiency on a data line and the storage capacity of messages is needed to fill up the current market technical requirements.

Disclosure of Invention

In order to solve the problems, the invention provides a distributed heterogeneous processing framework-based construction method and a distributed heterogeneous processing framework-based construction system.

Specifically, the construction method based on the distributed heterogeneous processing framework comprises the following steps:

s1, sending request data to a service scheduling module through a publishing module, wherein the request data comprises a first message;

s2, the publishing module blocks the current thread of the publishing module by subscribing the first message;

s3, receiving the request data through a service scheduling module, packaging the request data into an intermediate file packet, and writing the intermediate file packet into a cache module, wherein the intermediate file packet comprises a second message and configuration data;

s4, the service scheduling module blocks the current thread of the service scheduling module by subscribing the second message;

s5, calling the intermediate file packet through a calculation engine module, loading and processing corresponding data according to configuration data in the intermediate file packet through the calculation engine module, and storing a processed final file packet to a cache module;

s6, the calculation engine module issues the second message and a second end mark to the service scheduling module;

s7, a service scheduling module receives the second message, identifies the second ending mark and cancels the subscription of the second message;

s8, the service scheduling module issues the first message and the first end mark to a publishing module;

s9, a publishing module receives the first message, identifies the first end mark and cancels the subscription of the first message;

and S10, the issuing module reads the final file packet from the cache module to obtain a data processing result.

Further, the specific implementation steps of step S1 are:

s11, converting the data to be sent comprising the first message into a json character string format and packaging the json character string format into the request data;

and S12, the issuing module sends the request data to the service scheduling module in an asynchronous mode.

Further, the specific implementation steps of step S3 are:

s31, the service scheduling module converts the received request data into a Map format;

s32, extracting configuration data in the request data in the Map format, and packaging the configuration data and the second message into an intermediate file packet in a json format;

and S33, writing the intermediate file packet into a cache module.

Further, the step S4 is implemented by the following steps:

s41, the service scheduling module starts a thread and calls the calculation engine module by calling a processing function;

and S42, subscribing the second message to block the current thread by the service scheduling module.

Specifically, the distributed heterogeneous processing framework system is characterized by comprising a publishing module, a service scheduling module, a calculation engine module, a cache module and a data source module, wherein the publishing module is respectively connected with the cache module and the service scheduling module;

the release module is used for sending request data to the service scheduling module and realizing thread blocking control by subscribing a first message;

the service scheduling module is used for receiving the request data, extracting the configuration file in the request data, packaging the configuration file and the second message into an intermediate file packet, storing the intermediate file packet into the cache module, realizing thread blocking control by subscribing the second message, calling the calculation engine module by calling a processing function and publishing a first subscription message to the publishing module;

the computing engine module is used for calling an intermediate file package, loading corresponding data from the data source module according to configuration data of an intermediate file, processing the loaded data, storing the processed data into the cache module and issuing a second subscription message to the service scheduling module;

the cache module is used for storing the intermediate file packet and the final file packet;

the data source module is used for providing associated data and files required by data processing.

Further, the publishing module is an application client.

Further, the service scheduling module is a Netty server.

Further, the calculation engine module is a Spark cluster.

Further, the cache module is a redis database.

Further, the data source module comprises a relational database, a non-relational database and a file system.

The invention has the beneficial effects that: based on the architecture design concept of distributed heterogeneous data processing, the construction of a system framework is completed through a distributed computing engine module, a service scheduling module, a publishing module and a cache module, so that distributed computing and storage of data are realized, and the real-time processing efficiency on a data line and the storage capacity of messages are effectively improved.

Drawings

FIG. 1 is a method flow diagram of a distributed heterogeneous processing framework based construction method of the present invention;

fig. 2 is a schematic structural diagram of the distributed heterogeneous processing framework-based system of the present invention.

Detailed Description

In order to more clearly understand the technical features, objects, and effects of the present invention, embodiments of the present invention will now be described with reference to the accompanying drawings.

As shown in fig. 1, the distributed heterogeneous processing framework-based building method, which depends on the distributed heterogeneous processing framework-based system, can also be implemented independently, and includes the following steps:

s1, starting a Netty server terminal in the Linux environment, monitoring a designated port, and waiting for Socket connection;

the Web application sends data to the Netty server;

sending request data to a service scheduling module through a publishing module, wherein the request data comprises a first message, and the method specifically comprises the following steps:

s11, formatting data to be transmitted, including the first message, into a json character string form through a json class library GSON of JAVA, and packaging the data into request data;

s12, the Netty client is started and request data are sent to the Netty server in an asynchronous mode.

S2, the Netty client subscribes the current thread of the first message blocking publishing module by adopting Jedis through a publishing/subscribing mode provided by redis, and the first message is packaged in json format request data in the step S11;

s3, receiving the request data through the Netty server, packaging the request data into an intermediate file packet, writing the intermediate file packet into a redis, wherein the intermediate file packet comprises a first message, a second message and configuration data, and the specific process comprises the following steps:

s31, the Netty server converts the received json format request data into a Map format by using GSON, and encapsulates the second message in the Map format data;

s32, extracting parameters required by Spark cluster operation processing in the Map-formatted data in the step S31, wherein the parameters comprise configuration data in the request data, and packaging the required parameters and the second message into a json-formatted intermediate file packet;

and S33, writing the intermediate file packet into the redis.

S4, the Netty server side subscribes the second message by using a publish/subscribe mode provided by redis to block the current thread of the Netty server, and the specific process is as follows:

s41, the Netty server starts a thread and calls a formatted Linux command in a Spark-submit Spark gap jar name, wherein the Spark gap jar name is read from Map data in the step S31 to call a Spark cluster;

and S42, the Netty server side blocks the current thread by subscribing the second message by Jedis according to the publish/subscribe mode provided by redis.

S5, the Spark cluster receives the call of the command in the step S41, reads the intermediate file package from the redis by using Jedis according to the spark.jar file package, loads corresponding data from the data source module by using Spark SQL according to the configuration data in the intermediate file package, filters and processes the loaded data by using Spark RDD, stores the processed final file data package in DateSet < Row > format to the redis by using Pipeline of Jedis, and the data source module comprises a relational database, a non-relational database and a file system;

s6, the Spark cluster acquires a second message subscribed by the Netty server end through Spark and jar files, and issues the second message in a json form and a second ending mark to the Netty server by adopting Jedis in a publishing/subscribing mode provided by redis;

s7, the Netty server receives the second message and judges whether the message is finished by reading flag = true/false, when the Netty server receives the flag = true identifier, the Netty server cancels the subscription of the second message, and the current thread blockage of the Netty server is finished;

s8, the Netty server acquires the first message encapsulated in the intermediate file packet, and issues the first message and the first end mark to the Netty client by adopting Jedis in a publish/subscribe mode provided by redis;

s9, the Netty client receives the first message and judges whether the message is ended by reading flag = true/false, and when the Netty client receives the flag = true identifier, the Netty client cancels the subscription of the first message and ends the current thread blocking of the Netty client;

and S10, the Netty client selectively uses the concurrent or blocked queue or Pipeline according to the data quantity to read the final file packet processed by the Spark cluster from the redis to obtain the data processing result.

As shown in fig. 2, the distributed heterogeneous processing framework system includes a publishing module, a service scheduling module, a calculation engine module, a cache module, and a data source module, where the publishing module is connected to the cache module and the service scheduling module respectively, the service scheduling module is connected to the cache module and the calculation engine module respectively, the cache module is connected to the calculation engine module, and the calculation engine module is connected to the data source module;

the service scheduling module is used for receiving the request data, extracting the configuration file in the request data, packaging the configuration file and the second message into an intermediate file packet, storing the intermediate file packet into the cache module, realizing thread blocking control by subscribing the second message, calling the calculation engine module by calling the processing function and publishing the first subscription message to the publishing module;

the computing engine module is used for calling the intermediate file package, loading corresponding data from the data source module according to the configuration data of the intermediate file, processing the loaded data, storing the processed data into the cache module and issuing a second subscription message to the service scheduling module;

the cache module is used for storing the intermediate file package and the final file package;

Further, the publishing module is an application client.

Further, the service scheduling module is a Netty server.

Further, the calculation engine module is a Spark cluster.

Further, the cache module is a redis database.

It should be noted that, for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the order of acts described, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and elements referred to are not necessarily required in this application.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a ROM, a RAM, etc.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims

1. The distributed heterogeneous processing framework-based construction method is characterized by comprising the following steps of:

s10, the issuing module reads the final file packet from the cache module to obtain a data processing result;

the specific implementation steps of step S1 are as follows:

s12, the issuing module sends the request data to a service scheduling module in an asynchronous mode;

the specific implementation steps of step S3 are as follows:

s33, writing the intermediate file packet into a cache module;

the step S4 is specifically implemented as follows:

2. The distributed heterogeneous processing framework system is characterized by comprising a publishing module, a service scheduling module, a calculation engine module, a cache module and a data source module, wherein the publishing module is respectively connected with the cache module and the service scheduling module;

3. The distributed heterogeneous processing framework based system of claim 2 wherein the publishing module is an application client.

4. The distributed heterogeneous processing framework based system of claim 2 wherein the service scheduling module is a Netty server.

5. The distributed heterogeneous processing framework based system of claim 2 wherein the compute engine module is a Spark cluster.

6. The distributed heterogeneous processing framework based system of claim 2, wherein the caching module is a redis database.

7. The distributed heterogeneous processing framework based system of claim 2 wherein the data source modules comprise relational databases, non-relational databases, and file systems.