CN114510471A - Method, server and storage medium for real-time state calculation of big data platform - Google Patents

Method, server and storage medium for real-time state calculation of big data platform Download PDF

Info

Publication number
CN114510471A
CN114510471A CN202210142177.XA CN202210142177A CN114510471A CN 114510471 A CN114510471 A CN 114510471A CN 202210142177 A CN202210142177 A CN 202210142177A CN 114510471 A CN114510471 A CN 114510471A
Authority
CN
China
Prior art keywords
data
plug
framework
real
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210142177.XA
Other languages
Chinese (zh)
Other versions
CN114510471B (en
Inventor
周模
戴帅夫
刘丙双
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiuqi Technology Co ltd
Original Assignee
Beijing Jiuqi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiuqi Technology Co ltd filed Critical Beijing Jiuqi Technology Co ltd
Priority to CN202210142177.XA priority Critical patent/CN114510471B/en
Publication of CN114510471A publication Critical patent/CN114510471A/en
Application granted granted Critical
Publication of CN114510471B publication Critical patent/CN114510471B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44521Dynamic linking or loading; Link editing at or after load time, e.g. Java class loading
    • G06F9/44526Plug-ins; Add-ons
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a method, a server and a storage medium for real-time state calculation of a big data platform. The framework is divided into two layers, the framework of the first layer is responsible for general data subscription and processing, and the framework of the second layer is responsible for state processing and distribution, so that the final service plug-in only needs to be responsible for processing the update logic of state data and outputting the service data based on the state and input data due to the double-layer framework design. The setting of the secondary plug-in greatly simplifies the complex business logic of the state-based data processing in the big data processing, the related development only needs to realize a few steps in the process, and the universal operation is realized by a two-layer framework.

Description

Method, server and storage medium for real-time state calculation of big data platform
Technical Field
The invention relates to the technical field of big data analysis and processing, in particular to a method, a server and a storage medium for calculating a real-time state of a big data platform.
Background
Currently, in big data processing, real-time data processing is an important part, and an open-source computing framework can provide convenience for distributed execution, but for conventional data access, subscription, decoding and other processes, code writing needs to be performed by itself, and the work of the part is often repetitive work, so that a general part needs to be further extracted to reduce repetitive work load, and for relatively fixed scenes and architectural designs, the work of data access, subscription and decoding can be fixed.
After the part of data access, subscription and decoding is fixed on an open-source computing framework, a layer of processing framework related to the service can be built on the open-source computing framework, interfaces are exposed on the layer of framework, and service codes are developed on the framework in a plug-in mode. A class of plug-ins developed based on such a framework need only complete the logic for data processing, and such plug-ins can perform much of the task of rule-based data cleansing, ETL, and the like, because each batch of output data is determined only by the input data.
In a more complex class of state-based service scenarios, the above-described one-level framework is not sufficient to fulfill the relevant requirements. State-based data processing means that the output data depends on the input data and the current state, i.e. the same input data does not necessarily have the same output data in different cases, while the input data updates the state, except for affecting the output data. In the face of such a service scenario, a development framework with more complete functions is needed.
Therefore, how to improve the development convenience of the large data processing service based on the state is a problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
In view of this, the present invention provides a method for calculating a real-time status of a big data platform, which is a hierarchical framework, i.e. a two-layer framework, wherein the top-layer framework is a two-level plug-in framework, which solves the problem of processing the status, and the bottom-layer one-level plug-in framework cooperates with each other to solve the problem of development convenience of big data processing services based on the status, and the difficulty of processing the status data is reduced by a set of universal real-time subscription program frameworks. The framework of the first layer of the framework is responsible for general data subscription and processing, and the framework of the second layer is responsible for state processing and distribution, so that the final service plug-in only needs to be responsible for processing the update logic of the state data and outputting the service data based on the state and the input data due to the double-layer framework design. The setting of the secondary plug-in greatly simplifies the complex business logic of the state-based data processing in the big data processing, the related development only needs to realize a few steps in the process, and the universal operation is realized by a two-layer framework.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a method for calculating the real-time state of a big data platform, which comprises the following steps:
step 1: the real-time subscription program framework subscribes data from a data bus in real time, performs deserialization decoding on the data according to a uniform requirement to obtain decoded data, provides a data processing interface for a primary plug-in and transmits the decoded data to the primary plug-in;
step 2: the primary plug-in performs ETL data cleaning on the decoded data, and the processed result data is submitted to a secondary plug-in framework; the first-level plug-in is used as an upper-layer building, the input and the output of data of the first-level plug-in are all butted by a real-time subscription program framework, the real-time subscription framework is a final processing program, and the framework and the plug-in are based on the principle of reducing development amount; the data submitted to the real-time subscription program framework is a data processing result, the real-time subscription program framework is responsible for an external interface, and the data are finally pushed to a data bus, so that the public development amount of the first-level plug-in is reduced;
and step 3: redistributing the processed result data by a secondary plug-in a secondary plug-in frame developed based on the primary plug-in, dividing according to the user number, and simultaneously providing a basic interface for updating state data by the secondary plug-in frame; a plurality of secondary plug-ins simultaneously receive the result data and execute the redistribution; the primary plug-in can only complete the most basic ETL work and cannot complete more complex state-based business logic, so a secondary plug-in framework is adopted for data processing; the basic interface transmits ETL information processed by the first-level plug-in and provides data redistribution (shuffle) service for the second-level plug-in, so that all complex service logics needing to be executed by the second-level plug-in are executed only once in a data redistribution link consuming computing resources most, and a plurality of services are not required to execute a plurality of programs, so that the plurality of second-level plug-ins can be simultaneously executed in a first-level plug-in frame;
and 4, step 4: the secondary plug-in developed based on the secondary plug-in framework completes the updating of internal state data, outputs service data and transmits the service data to the primary plug-in; the output service data can be processed based on the state and the input data, and can also be directly output data based on the state at regular intervals;
and 5: and the primary plug-in transmits the service data to the real-time subscription framework, and the real-time subscription framework outputs the service data to a big data platform.
Preferably, the data accessed by the real-time subscription program framework in step 1 can be acquired from any data analysis platform.
Preferably, the primary plug-in and the real-time subscription program framework in step 2 have a fixed interface, and the primary plug-in can be dedicated to data analysis.
Preferably, the secondary plug-in framework in step 3 has identity with the primary plug-in, i.e. the primary plug-in itself is written as a framework.
Preferably, a fixed interface is provided between the secondary plug-in and the secondary plug-in framework in step 4, and the secondary plug-in can be dedicated to data analysis services with states, so as to implement service logic processing based on states.
A server comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the above method.
A computer-readable storage medium storing a computer program comprising instructions for the steps of the above method.
According to the technical scheme, compared with the prior art, the invention discloses and provides the method for calculating the real-time state of the big data platform, the double-layer framework provided by the invention reduces the development amount of more complicated conventional work in a complicated big data processing service based on the state, not only comprises the most basic functions of data access, decoding and data output, but also additionally provides interfaces for state updating, state acquisition and state service data output aiming at the state processing part.
Based on the service plug-in of the invention, only the current state updating needs to be realized based on each user, and in the process of updating the state, which data can be output according to the service requirement. Meanwhile, the function of accessing the state at regular time in the first-level plug-in can be used for all the second-level plug-ins in the second-level plug-in framework, and the content in the current state can be provided as service output data at regular intervals under the condition of no input data.
The double-layer framework provided by the invention can define the data subject to be subscribed, the input data format in the subject enables the framework to complete automatic subscription and decoding, and automatically complete data distribution based on the state. The interface of the docking framework in the service plug-in is logic for processing specific data of a single user, and the previous state data of the user can be consulted or updated at any time in the processing process.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of the present invention for real-time status calculation of a big data platform.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a method for calculating the real-time state of a big data platform, which comprises the following steps:
s1: the real-time subscription program framework is responsible for subscribing data from the data bus in real time, deserializing and decoding the data according to a uniform requirement to obtain decoded data, providing a data processing interface for the first-level plug-in and transmitting the decoded data to the first-level plug-in;
s2: the first-level plug-in unit carries out data cleaning on the decoded data, and processed result data are submitted to a second-level plug-in unit frame;
s3: the second-level plug-in framework is developed based on the first-level plug-in, the second-level plug-in the second-level plug-in framework redistributes the processed result data and divides the result data according to the user number, and meanwhile, the second-level plug-in framework provides a basic interface for updating the state data;
s4: the second-level plug-in unit is developed based on the second-level plug-in unit framework, completes most core logics, namely updating of internal state data and outputting of service data, and transmits the service data to the first-level plug-in unit; the output service data can be processed based on the state and the input data, and can also be directly output based on the state at regular intervals;
s5: the primary plug-in transmits the service data to the real-time subscription program framework, and the real-time subscription program framework outputs the service data to the big data platform.
In order to further optimize the above technical solution, the data accessed by the real-time subscription program framework in S1 may be obtained from any data analysis platform.
In order to further optimize the above technical solution, the first level plug-in S2 and the real-time subscription program framework in S1 have fixed interfaces, and the plug-in may be focused on data analysis.
In order to further optimize the above technical solution, the secondary plug-in framework in S3 has identity with the primary plug-in, i.e. the primary plug-in itself is written as a framework.
In order to further optimize the above technical solution, a fixed interface is provided between the secondary plug-in S4 and the secondary plug-in framework, and the secondary plug-in may be dedicated to the data analysis service with state.
The invention also provides a server comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the above method.
The present invention also provides a computer-readable storage medium storing a computer program comprising instructions for the steps of the above method.
Examples
The following describes the process of the method provided by the present invention with a trajectory calculation task based on signaling data in a certain province:
1. subscribing signaling data of the province, decoding the signaling data through a primary plug-in framework, extracting information such as the number, the position, the time and the like of a user from the data, and outputting the information; the real-time subscription program framework is responsible for the most original data deserialization, and the primary plug-in the primary plug-in framework realizes the data cleaning of the ETL;
2. the second-level plug-in framework distributes the data output by all the first-level plug-in frameworks according to the numbers of the users and redistributes the data so that each service computing node processes the numbers of the users in the same batch;
3. when processing the current input data (including position and time) of a user, the service plug-in developed based on the secondary plug-in framework judges whether the user has state data before, if not, the state of the service plug-in is updated; the selection of the state data is related to specific business logic, and in the track calculation, the state data comprises the last appearing time and position of the user;
if the user has state information before, judging whether the current position is changed compared with the previous position, if not, not updating the state, and not outputting the service data; if the current position has changed from the previous position, outputting trajectory data, the trajectory data including the previous position, time, and dwell time, the dwell time being the result of subtracting the time in the previous state from the time in the current input data;
4. after the result data output by the service is obtained, the result data is transmitted to the conventional data part of the primary plug-in framework by the secondary plug-in framework, and when the action is executed, the identity of the primary plug-in and the primary plug-in framework are interacted by the secondary plug-in framework;
5. the first-level plug-in framework outputs the result data to the big data platform to complete the whole service process, and for the first-level plug-in framework, the first-level plug-in framework only needs to access the data, decode the data and send the data to the upper layer of the first-level plug-in framework and is responsible for outputting the result data of the upper layer to the big data platform, and whether the middle processing process contains the state or not is transparent to the first-level plug-in framework.
The invention is characterized in that the layered framework system can clearly define the conventional work in a complicated business process based on the state big data processing, and can complete the conventional processing processes in different aspects through the division and cooperation of the two layers of frameworks. The service plug-in on the framework system only needs to pay attention to the state logic of the service plug-in, namely how each piece of data affects the state, how the state and the input data affect the output data, and does not need to pay attention to the decoding and distribution process after the original data subscription.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (7)

1. A method for calculating real-time state of a big data platform is characterized by comprising the following steps:
step 1: the real-time subscription program framework subscribes data from a data bus in real time, performs deserialization decoding on the data to obtain decoded data, provides a data processing interface for a primary plug-in and transmits the decoded data to the primary plug-in;
step 2: the first-level plug-in unit carries out data cleaning on the decoded data, and processed result data are submitted to a second-level plug-in unit framework;
and step 3: the secondary plug-in the secondary plug-in framework redistributes the processed result data, and meanwhile, the secondary plug-in framework provides a basic interface for updating state data;
and 4, step 4: the second-level plug-in completes the updating of the internal state data, outputs the service data and transmits the service data to the first-level plug-in;
and 5: and the primary plug-in transmits the service data to the real-time subscription program framework, and the real-time subscription program framework outputs the service data to a big data platform.
2. The method for big data platform real-time status calculation according to claim 1, wherein the data accessed by the real-time subscription program framework in step 1 is obtained from any data analysis platform.
3. The method for big data platform real-time status calculation according to claim 1, wherein the primary plug-in and the real-time subscription program framework in step 2 have fixed interfaces, and the primary plug-in is used for data analysis.
4. The method for big data platform real-time status computation of claim 1, wherein the secondary plug-in framework has identity with the primary plug-in step 3.
5. The method for big data platform real-time status calculation according to claim 1, wherein in step 4, a fixed interface is provided between the secondary plug-in and the secondary plug-in framework, and the secondary plug-in is used for data analysis service with status.
6. A server, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the method according to any one of claims 1 to 5.
7. A computer-readable storage medium, in which a computer program is stored, the computer program comprising instructions for carrying out the steps of the method according to any one of claims 1 to 5.
CN202210142177.XA 2022-02-16 2022-02-16 Method, server and storage medium for real-time state calculation of big data platform Active CN114510471B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210142177.XA CN114510471B (en) 2022-02-16 2022-02-16 Method, server and storage medium for real-time state calculation of big data platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210142177.XA CN114510471B (en) 2022-02-16 2022-02-16 Method, server and storage medium for real-time state calculation of big data platform

Publications (2)

Publication Number Publication Date
CN114510471A true CN114510471A (en) 2022-05-17
CN114510471B CN114510471B (en) 2023-07-21

Family

ID=81551320

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210142177.XA Active CN114510471B (en) 2022-02-16 2022-02-16 Method, server and storage medium for real-time state calculation of big data platform

Country Status (1)

Country Link
CN (1) CN114510471B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874027A (en) * 2016-12-25 2017-06-20 北京通途永久科技有限公司 A kind of transportation industry quality of data monitoring platform based on plug-in unit mode
US20180316777A1 (en) * 2017-04-26 2018-11-01 International Business Machines Corporation Invoking enhanced plug-ins and creating workflows having a series of enhanced plug-ins
WO2020223997A1 (en) * 2019-05-05 2020-11-12 东北大学 Data analysis software architecture design method capable of implementing global configuration of storage, calculation and display

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106059892A (en) * 2016-05-17 2016-10-26 中国科学院沈阳计算技术研究所有限公司 Message engine integrated with communication system
CN106097059A (en) * 2016-06-08 2016-11-09 百度在线网络技术(北京)有限公司 The processing method of a kind of closed loop of concluding the business and platform

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874027A (en) * 2016-12-25 2017-06-20 北京通途永久科技有限公司 A kind of transportation industry quality of data monitoring platform based on plug-in unit mode
US20180316777A1 (en) * 2017-04-26 2018-11-01 International Business Machines Corporation Invoking enhanced plug-ins and creating workflows having a series of enhanced plug-ins
WO2020223997A1 (en) * 2019-05-05 2020-11-12 东北大学 Data analysis software architecture design method capable of implementing global configuration of storage, calculation and display

Also Published As

Publication number Publication date
CN114510471B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN107491485B (en) Method for generating execution plan, plan unit device and distributed NewSQ L database system
CN108280023B (en) Task execution method and device and server
CN109684057B (en) Task processing method and device and storage medium
CN102325191B (en) Fully automatic treatment method and frame without page refresh
CN111124379B (en) Page generation method and device, electronic equipment and storage medium
CN110580189A (en) method and device for generating front-end page, computer equipment and storage medium
CN111026493B (en) Interface rendering processing method and device
CN112052082B (en) Task attribute optimization method, device, server and storage medium
CN116360735A (en) Form generation method, device, equipment and medium
CN111523670A (en) Batch reasoning method, device and medium for improving deep learning reasoning equipment utilization rate
CN114697372A (en) Data transmission processing and storage method, system and medium in distributed system
CN111078573A (en) Test message generation method and device
CN114510471A (en) Method, server and storage medium for real-time state calculation of big data platform
US20090055202A1 (en) Framework for development of integration adapters that surface non-static, type-safe service contracts to lob systems
CN115525321A (en) Distributed task generation method, device, equipment and storage medium
CN115186738A (en) Model training method, device and storage medium
Rossi et al. Definition and validation of design metrics for distributed applications
CN113849161A (en) Application control method and device, storage medium and electronic equipment
CN111352940A (en) Data processing method and system
CN111026371A (en) Game development method and device, electronic equipment and storage medium
WO2024130795A1 (en) Optimization method for calling data execution on basis of distributed storage, and apparatus
CN112748980B (en) Message pushing method, device, equipment and computer readable storage medium
CN111208980B (en) Data analysis processing method and system
CN116954576A (en) Method and device for developing components, storage medium and electronic equipment
CN117555898A (en) Data construction method, system and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Method, server, and storage medium for real-time state calculation on big data platforms

Effective date of registration: 20230907

Granted publication date: 20230721

Pledgee: Industrial Bank Co.,Ltd. Beijing Dongcheng sub branch

Pledgor: Beijing Jiuqi Technology Co.,Ltd.

Registration number: Y2023980055648