CN114510471A

CN114510471A - Method, server and storage medium for real-time state calculation of big data platform

Info

Publication number: CN114510471A
Application number: CN202210142177.XA
Authority: CN
Inventors: 周模; 戴帅夫; 刘丙双
Original assignee: Beijing Jiuqi Technology Co ltd
Current assignee: Beijing Jiuqi Technology Co ltd
Priority date: 2022-02-16
Filing date: 2022-02-16
Publication date: 2022-05-17
Anticipated expiration: 2042-02-16
Also published as: CN114510471B

Abstract

The invention discloses a method, a server and a storage medium for real-time state calculation of a big data platform. The framework is divided into two layers, the framework of the first layer is responsible for general data subscription and processing, and the framework of the second layer is responsible for state processing and distribution, so that the final service plug-in only needs to be responsible for processing the update logic of state data and outputting the service data based on the state and input data due to the double-layer framework design. The setting of the secondary plug-in greatly simplifies the complex business logic of the state-based data processing in the big data processing, the related development only needs to realize a few steps in the process, and the universal operation is realized by a two-layer framework.

Description

Method, server and storage medium for real-time state calculation of big data platform

Technical Field

The invention relates to the technical field of big data analysis and processing, in particular to a method, a server and a storage medium for calculating a real-time state of a big data platform.

Background

Currently, in big data processing, real-time data processing is an important part, and an open-source computing framework can provide convenience for distributed execution, but for conventional data access, subscription, decoding and other processes, code writing needs to be performed by itself, and the work of the part is often repetitive work, so that a general part needs to be further extracted to reduce repetitive work load, and for relatively fixed scenes and architectural designs, the work of data access, subscription and decoding can be fixed.

After the part of data access, subscription and decoding is fixed on an open-source computing framework, a layer of processing framework related to the service can be built on the open-source computing framework, interfaces are exposed on the layer of framework, and service codes are developed on the framework in a plug-in mode. A class of plug-ins developed based on such a framework need only complete the logic for data processing, and such plug-ins can perform much of the task of rule-based data cleansing, ETL, and the like, because each batch of output data is determined only by the input data.

In a more complex class of state-based service scenarios, the above-described one-level framework is not sufficient to fulfill the relevant requirements. State-based data processing means that the output data depends on the input data and the current state, i.e. the same input data does not necessarily have the same output data in different cases, while the input data updates the state, except for affecting the output data. In the face of such a service scenario, a development framework with more complete functions is needed.

Therefore, how to improve the development convenience of the large data processing service based on the state is a problem that needs to be solved urgently by those skilled in the art.

Disclosure of Invention

In view of this, the present invention provides a method for calculating a real-time status of a big data platform, which is a hierarchical framework, i.e. a two-layer framework, wherein the top-layer framework is a two-level plug-in framework, which solves the problem of processing the status, and the bottom-layer one-level plug-in framework cooperates with each other to solve the problem of development convenience of big data processing services based on the status, and the difficulty of processing the status data is reduced by a set of universal real-time subscription program frameworks. The framework of the first layer of the framework is responsible for general data subscription and processing, and the framework of the second layer is responsible for state processing and distribution, so that the final service plug-in only needs to be responsible for processing the update logic of the state data and outputting the service data based on the state and the input data due to the double-layer framework design. The setting of the secondary plug-in greatly simplifies the complex business logic of the state-based data processing in the big data processing, the related development only needs to realize a few steps in the process, and the universal operation is realized by a two-layer framework.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides a method for calculating the real-time state of a big data platform, which comprises the following steps:

step 1: the real-time subscription program framework subscribes data from a data bus in real time, performs deserialization decoding on the data according to a uniform requirement to obtain decoded data, provides a data processing interface for a primary plug-in and transmits the decoded data to the primary plug-in;

step 2: the primary plug-in performs ETL data cleaning on the decoded data, and the processed result data is submitted to a secondary plug-in framework; the first-level plug-in is used as an upper-layer building, the input and the output of data of the first-level plug-in are all butted by a real-time subscription program framework, the real-time subscription framework is a final processing program, and the framework and the plug-in are based on the principle of reducing development amount; the data submitted to the real-time subscription program framework is a data processing result, the real-time subscription program framework is responsible for an external interface, and the data are finally pushed to a data bus, so that the public development amount of the first-level plug-in is reduced;

and step 3: redistributing the processed result data by a secondary plug-in a secondary plug-in frame developed based on the primary plug-in, dividing according to the user number, and simultaneously providing a basic interface for updating state data by the secondary plug-in frame; a plurality of secondary plug-ins simultaneously receive the result data and execute the redistribution; the primary plug-in can only complete the most basic ETL work and cannot complete more complex state-based business logic, so a secondary plug-in framework is adopted for data processing; the basic interface transmits ETL information processed by the first-level plug-in and provides data redistribution (shuffle) service for the second-level plug-in, so that all complex service logics needing to be executed by the second-level plug-in are executed only once in a data redistribution link consuming computing resources most, and a plurality of services are not required to execute a plurality of programs, so that the plurality of second-level plug-ins can be simultaneously executed in a first-level plug-in frame;

and 4, step 4: the secondary plug-in developed based on the secondary plug-in framework completes the updating of internal state data, outputs service data and transmits the service data to the primary plug-in; the output service data can be processed based on the state and the input data, and can also be directly output data based on the state at regular intervals;

and 5: and the primary plug-in transmits the service data to the real-time subscription framework, and the real-time subscription framework outputs the service data to a big data platform.

Preferably, the data accessed by the real-time subscription program framework in step 1 can be acquired from any data analysis platform.

Preferably, the primary plug-in and the real-time subscription program framework in step 2 have a fixed interface, and the primary plug-in can be dedicated to data analysis.

Preferably, the secondary plug-in framework in step 3 has identity with the primary plug-in, i.e. the primary plug-in itself is written as a framework.

Preferably, a fixed interface is provided between the secondary plug-in and the secondary plug-in framework in step 4, and the secondary plug-in can be dedicated to data analysis services with states, so as to implement service logic processing based on states.

A server comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the above method.

A computer-readable storage medium storing a computer program comprising instructions for the steps of the above method.

According to the technical scheme, compared with the prior art, the invention discloses and provides the method for calculating the real-time state of the big data platform, the double-layer framework provided by the invention reduces the development amount of more complicated conventional work in a complicated big data processing service based on the state, not only comprises the most basic functions of data access, decoding and data output, but also additionally provides interfaces for state updating, state acquisition and state service data output aiming at the state processing part.

Based on the service plug-in of the invention, only the current state updating needs to be realized based on each user, and in the process of updating the state, which data can be output according to the service requirement. Meanwhile, the function of accessing the state at regular time in the first-level plug-in can be used for all the second-level plug-ins in the second-level plug-in framework, and the content in the current state can be provided as service output data at regular intervals under the condition of no input data.

The double-layer framework provided by the invention can define the data subject to be subscribed, the input data format in the subject enables the framework to complete automatic subscription and decoding, and automatically complete data distribution based on the state. The interface of the docking framework in the service plug-in is logic for processing specific data of a single user, and the previous state data of the user can be consulted or updated at any time in the processing process.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of the present invention for real-time status calculation of a big data platform.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention discloses a method for calculating the real-time state of a big data platform, which comprises the following steps:

s1: the real-time subscription program framework is responsible for subscribing data from the data bus in real time, deserializing and decoding the data according to a uniform requirement to obtain decoded data, providing a data processing interface for the first-level plug-in and transmitting the decoded data to the first-level plug-in;

s2: the first-level plug-in unit carries out data cleaning on the decoded data, and processed result data are submitted to a second-level plug-in unit frame;

s3: the second-level plug-in framework is developed based on the first-level plug-in, the second-level plug-in the second-level plug-in framework redistributes the processed result data and divides the result data according to the user number, and meanwhile, the second-level plug-in framework provides a basic interface for updating the state data;

s4: the second-level plug-in unit is developed based on the second-level plug-in unit framework, completes most core logics, namely updating of internal state data and outputting of service data, and transmits the service data to the first-level plug-in unit; the output service data can be processed based on the state and the input data, and can also be directly output based on the state at regular intervals;

s5: the primary plug-in transmits the service data to the real-time subscription program framework, and the real-time subscription program framework outputs the service data to the big data platform.

In order to further optimize the above technical solution, the data accessed by the real-time subscription program framework in S1 may be obtained from any data analysis platform.

In order to further optimize the above technical solution, the first level plug-in S2 and the real-time subscription program framework in S1 have fixed interfaces, and the plug-in may be focused on data analysis.

In order to further optimize the above technical solution, the secondary plug-in framework in S3 has identity with the primary plug-in, i.e. the primary plug-in itself is written as a framework.

In order to further optimize the above technical solution, a fixed interface is provided between the secondary plug-in S4 and the secondary plug-in framework, and the secondary plug-in may be dedicated to the data analysis service with state.

The invention also provides a server comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the above method.

The present invention also provides a computer-readable storage medium storing a computer program comprising instructions for the steps of the above method.

Examples

The following describes the process of the method provided by the present invention with a trajectory calculation task based on signaling data in a certain province:

1. subscribing signaling data of the province, decoding the signaling data through a primary plug-in framework, extracting information such as the number, the position, the time and the like of a user from the data, and outputting the information; the real-time subscription program framework is responsible for the most original data deserialization, and the primary plug-in the primary plug-in framework realizes the data cleaning of the ETL;

2. the second-level plug-in framework distributes the data output by all the first-level plug-in frameworks according to the numbers of the users and redistributes the data so that each service computing node processes the numbers of the users in the same batch;

3. when processing the current input data (including position and time) of a user, the service plug-in developed based on the secondary plug-in framework judges whether the user has state data before, if not, the state of the service plug-in is updated; the selection of the state data is related to specific business logic, and in the track calculation, the state data comprises the last appearing time and position of the user;

if the user has state information before, judging whether the current position is changed compared with the previous position, if not, not updating the state, and not outputting the service data; if the current position has changed from the previous position, outputting trajectory data, the trajectory data including the previous position, time, and dwell time, the dwell time being the result of subtracting the time in the previous state from the time in the current input data;

4. after the result data output by the service is obtained, the result data is transmitted to the conventional data part of the primary plug-in framework by the secondary plug-in framework, and when the action is executed, the identity of the primary plug-in and the primary plug-in framework are interacted by the secondary plug-in framework;

5. the first-level plug-in framework outputs the result data to the big data platform to complete the whole service process, and for the first-level plug-in framework, the first-level plug-in framework only needs to access the data, decode the data and send the data to the upper layer of the first-level plug-in framework and is responsible for outputting the result data of the upper layer to the big data platform, and whether the middle processing process contains the state or not is transparent to the first-level plug-in framework.

The invention is characterized in that the layered framework system can clearly define the conventional work in a complicated business process based on the state big data processing, and can complete the conventional processing processes in different aspects through the division and cooperation of the two layers of frameworks. The service plug-in on the framework system only needs to pay attention to the state logic of the service plug-in, namely how each piece of data affects the state, how the state and the input data affect the output data, and does not need to pay attention to the decoding and distribution process after the original data subscription.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for calculating real-time state of a big data platform is characterized by comprising the following steps:

step 1: the real-time subscription program framework subscribes data from a data bus in real time, performs deserialization decoding on the data to obtain decoded data, provides a data processing interface for a primary plug-in and transmits the decoded data to the primary plug-in;

step 2: the first-level plug-in unit carries out data cleaning on the decoded data, and processed result data are submitted to a second-level plug-in unit framework;

and step 3: the secondary plug-in the secondary plug-in framework redistributes the processed result data, and meanwhile, the secondary plug-in framework provides a basic interface for updating state data;

and 4, step 4: the second-level plug-in completes the updating of the internal state data, outputs the service data and transmits the service data to the first-level plug-in;

and 5: and the primary plug-in transmits the service data to the real-time subscription program framework, and the real-time subscription program framework outputs the service data to a big data platform.

2. The method for big data platform real-time status calculation according to claim 1, wherein the data accessed by the real-time subscription program framework in step 1 is obtained from any data analysis platform.

3. The method for big data platform real-time status calculation according to claim 1, wherein the primary plug-in and the real-time subscription program framework in step 2 have fixed interfaces, and the primary plug-in is used for data analysis.

4. The method for big data platform real-time status computation of claim 1, wherein the secondary plug-in framework has identity with the primary plug-in step 3.

5. The method for big data platform real-time status calculation according to claim 1, wherein in step 4, a fixed interface is provided between the secondary plug-in and the secondary plug-in framework, and the secondary plug-in is used for data analysis service with status.

6. A server, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the method according to any one of claims 1 to 5.

7. A computer-readable storage medium, in which a computer program is stored, the computer program comprising instructions for carrying out the steps of the method according to any one of claims 1 to 5.