CN112379884B - Method and system for realizing flow engine based on Spark and parallel memory calculation - Google Patents

Method and system for realizing flow engine based on Spark and parallel memory calculation Download PDF

Info

Publication number
CN112379884B
CN112379884B CN202011267074.3A CN202011267074A CN112379884B CN 112379884 B CN112379884 B CN 112379884B CN 202011267074 A CN202011267074 A CN 202011267074A CN 112379884 B CN112379884 B CN 112379884B
Authority
CN
China
Prior art keywords
state
state machine
information
dateframe
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011267074.3A
Other languages
Chinese (zh)
Other versions
CN112379884A (en
Inventor
李斌
李艳丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202011267074.3A priority Critical patent/CN112379884B/en
Publication of CN112379884A publication Critical patent/CN112379884A/en
Application granted granted Critical
Publication of CN112379884B publication Critical patent/CN112379884B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4498Finite state machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a method and a system for realizing a flow engine based on Spark and parallel memory computation, and relates to the technical field of flow automation. The method comprises the following steps: the method comprises the steps of obtaining graphical flow arranging information, converting the graphical flow arranging information into arranging scripts, and storing the arranging scripts into a Spark platform; compiling the programming script into a state machine in the form of Java byte codes; the state machine can restore the state machine from the Spark platform to the memory to run; the state change information of the state machine is controlled and monitored through a state controller, and the state change information is stored on the Spark platform; and the state controller calls an external system through a call port to process the service according to the programming script. The invention can realize distributed data flow automation and business flow automation at the same time, and has high throughput and instant response, and is especially suitable for the scene requirement of high throughput.

Description

Method and system for realizing flow engine based on Spark and parallel memory calculation
Technical Field
The invention relates to the technical field of process automation, in particular to a process engine implementation method and system based on Spark and parallel memory computation.
Background
In the basic technical field of process automation, mainly some traditional BPMs (Business Process Automation, namely, business process automation, is a systematic method centered on standardized structured end-to-end excellent business processes and aiming at continuously improving the performance of organization business) and open source products represented by activities, the scheme is mainly applied to enterprise internal processes and is less applied to scenes with large data volume and large throughput. Meanwhile, the product is based on the traditional relational database, and the change of each flow state and the change of data relate to the operation of the database, so that the application scene is severely limited by the performance of the database.
The essence of the flow engine is state transition and process flow, and the flow engines currently on the market are mostly implemented based on Petri Net or finite state machine (Finite State Machine-FSM) models. Based on Petri Net or FSM model, token tree is utilized to drive transition of flow state, so as to realize common business flows such as serial flow, parallel flow, sub-flow and the like. On the one hand, the middle-stage trend of the enterprise IT system is more and more obvious at present, the low code and flexible characteristics of the flow engine are valued by enterprises, the enterprise system after internetworking directly interfaces with the business of the C end, and the flow engine is required to have larger throughput. However, due to the defect of the capability of the traditional flow engine for handling the high throughput business, and the automatic cooperation of flows in the cross-organization, cross-region and distributed deployment environments cannot be supported, the flow system cannot be deployed uniformly, the data islanding is caused, and the consistency and the integrity of the data are affected.
On the other hand, the data size to be processed in each industry has shown a huge trend, and big data is attracting wide attention of each industry in society. In the big data age, all uses data as center, can excavate and analyze many effective information that can't obtain through other modes from massive historical data to improve the accuracy of decision-making. The development of distributed computing provides an effective means for fully exploiting the value of data: the distributed computing can utilize cheap computer clusters to perform rapid computing analysis on mass data, so that the data analysis cost is effectively saved. In such an environment, a lot of distributed computing framework technologies have been developed, and innovations typified by Hadoop, spark are very active. The Spark is a fast and general computing engine designed specially for large-scale data processing, the intermediate output result of the Job (Job) can be stored in a memory (a memory distribution data set is enabled), and the HDFS is not required to be read and written. Spark is implemented in the Scala language, which uses Scala as its application framework. Akka is an Actor model framework written by Sca1a that is compatible with Saca1a and JAVA for writing high availability and high scalability. The method adopts a concurrent processing mode based on event driving, has high performance and good usability, and is widely applied to a distributed computing framework.
In summary, whether to provide a process engine system that can simultaneously implement distributed data process automation and business process automation processing and combine high throughput and instant responsiveness based on the above prior art is a technical problem to be solved currently.
Disclosure of Invention
The invention aims at: the method and the system for realizing the flow engine based on Spark and parallel memory computation are provided to overcome the defects of the prior art. The flow engine provided by the invention can realize distributed data flow automation and business flow automation at the same time, has better elasticity than the traditional business flow automation based on the database, gives consideration to high throughput and instant responsiveness, and is especially suitable for the scene requirement of high throughput.
In order to achieve the above object, the present invention provides the following technical solutions:
a process engine implementation method based on Spark and parallel memory calculation comprises the following steps:
the method comprises the steps of obtaining graphical flow arranging information, converting the graphical flow arranging information into arranging scripts, and storing the arranging scripts into a Spark platform;
compiling the programming script into a state machine in the form of Java byte codes; when the preset conditions are met, the state machine can restore the state machine from the Spark platform to the memory to operate;
the state change information of the state machine is controlled and monitored through a state controller, and the state change information is stored on the Spark platform;
and the state controller calls an external system through a call port to process the service according to the programming script.
Further, deploying the state machines and the corresponding state controllers into state machine containers as an operation unit, wherein each state machine container is used as a distributed deployment unit in the cluster;
and when multi-service concurrent processing is performed, a state machine container is newly added in a hot deployment mode, and a plurality of state machine containers are triggered to perform state control and monitoring respectively through scheduling and triggering of a state controller.
Further, the state machine container is provided with a call port that can be called by an external system.
Further, the data set of the Spark platform comprises a warehouse DateFrame, metadata DateFrame, a data stream DateFrame and a state DateFrame; the method comprises the steps that graphical flow layout information designed through a graphical designer is stored in a warehouse Dateframe, grammar analysis information of an inquirer connected with a state controller is stored in a metadata Dateframe, data in a data stream Dateframe are used for creating a state machine and the state controller, and state management information of the state machine is stored in a state Dateframe;
three layers of management data are arranged corresponding to the state machine, wherein the three layers of management data comprise FSM memory state data, state DateFrame and database tables, so that the state machine can recover to run in a memory from the Spark platform, and reverse persistence is realized.
Further, state machine state information and state machine motion information are acquired through a state controller, state management data are generated according to the state machine state information, motion execution data are generated according to the state machine motion information, and the state management data and the motion execution data are stored in a Spark platform.
Further, the state controller is configured to perform the steps of,
step 1, starting a state machine;
step 2, waiting for a new event to trigger, and determining to execute an action and set a next state according to the new event and the current state of the state machine;
step 3, acquiring a new event and triggering action calling of the event, namely triggering and calling an external system to carry out service processing according to the execution action corresponding to the new event;
step 4, receiving a return result of the external system, and judging whether the state machine is finished;
step 5, when the state machine is judged not to be finished, returning to the step 2; otherwise, the control state machine ends.
The invention also provides a flow engine system based on Spark and parallel memory calculation, which comprises:
a Spark platform;
the stream processing compiling module is used for acquiring graphical flow arranging information, converting the graphical flow arranging information into arranging scripts, storing the arranging scripts into the Spark platform, and compiling the arranging scripts into a state machine in a Java byte code form;
the state and context management module is used for controlling and monitoring state change information of the state machine through the state controller and storing the state change information on the Spark platform;
the service execution module is used for triggering the corresponding state controller to call the external system to carry out service processing through the call port according to the programming script;
the state and context management module is configured to restore the state machine from the Spark platform to the memory for operation when the preset condition is met.
Further, deploying the state machines and the corresponding state controllers into state machine containers as an operation unit, wherein each state machine container is used as a distributed deployment unit in the cluster;
the multi-service concurrent scheduling system further comprises a multi-service concurrent scheduling module, wherein the multi-service concurrent scheduling module is used for adding a state machine container in a hot deployment mode when multi-service concurrent processing is carried out, and scheduling and triggering a plurality of state controllers to trigger the plurality of state machine containers to respectively carry out state control and monitoring.
Further, the data set of the Spark platform comprises a warehouse DateFrame, metadata DateFrame, a data stream DateFrame and a state DateFrame; the method comprises the steps that graphical flow layout information designed through a graphical designer is stored in a warehouse Dateframe, grammar analysis information of an inquirer connected with a state controller is stored in a metadata Dateframe, data in a data stream Dateframe are used for creating a state machine and the state controller, and state management information of the state machine is stored in a state Dateframe;
three layers of management data are arranged corresponding to the state machine, wherein the three layers of management data comprise FSM memory state data, state DateFrame and database tables, so that the state machine can recover to run in a memory from the Spark platform, and reverse persistence is realized.
Further, state machine state information and state machine motion information are obtained through a state controller, state management data are generated according to the state machine state information, motion execution data are generated according to the state machine motion information, and the state management data and the motion execution data are stored in a Spark platform;
the state controller is configured to perform the steps of:
step 1, starting a state machine;
step 2, waiting for a new event to trigger, and determining to execute an action and set a next state according to the new event and the current state of the state machine;
step 3, acquiring a new event and triggering action calling of the event, namely triggering and calling an external system to carry out service processing according to the execution action corresponding to the new event;
step 4, receiving a return result of the external system, and judging whether the state machine is finished;
step 5, when the state machine is judged not to be finished, returning to the step 2; otherwise, the control state machine ends.
Compared with the prior art, the invention has the following advantages and positive effects by taking the technical scheme as an example: the flow engine provided by the invention can realize distributed data flow automation and business flow automation at the same time, has better elasticity than the traditional business flow automation based on the database, gives consideration to high throughput and instant responsiveness, and is especially suitable for the scene requirement of high throughput.
The invention provides a unified low-code solution for a data center and a business center. On one hand, the flow engine of the invention is based on an AKKA state machine, and combines dynamic compiling to realize conversion and translation from service language to machine language, thereby providing a low-code high-flexibility logic implementation scheme for enterprises. On the other hand, the problem of persistence to a Spark platform and inverse persistence of a state machine in a memory is solved through the FSM controller.
Drawings
Fig. 1 is a flowchart of a method for implementing a flow engine based on Spark and parallel memory computation according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of an operating principle of a state machine according to an embodiment of the present invention.
Fig. 3 is a schematic flow chart of a single service scenario provided in an embodiment of the present invention.
Fig. 4 is a schematic flow chart of multi-service scenario concurrency provided in an embodiment of the present invention.
Fig. 5 is a flowchart of a state machine according to an embodiment of the present invention.
Fig. 6 is an information transmission diagram of a flow engine system according to an embodiment of the present invention.
Fig. 7 is a block diagram of a flow engine system according to an embodiment of the present invention.
Detailed Description
The method and the system for realizing the flow engine based on Spark and parallel memory computation disclosed by the invention are further described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that the technical features or combinations of technical features described in the following embodiments should not be regarded as being isolated, and they may be combined with each other to achieve a better technical effect. In the drawings of the embodiments described below, like reference numerals appearing in the various drawings represent like features or components and are applicable to the various embodiments. Thus, once an item is defined in one drawing, no further discussion thereof is required in subsequent drawings.
It should be noted that the structures, proportions, sizes, etc. shown in the drawings are merely used in conjunction with the disclosure of the present specification, and are not intended to limit the applicable scope of the present invention, but rather to limit the scope of the present invention. The scope of the preferred embodiments of the present invention includes additional implementations in which functions may be performed out of the order described or discussed, including in a substantially simultaneous manner or in an order that is reverse, depending on the function involved, as would be understood by those of skill in the art to which embodiments of the present invention pertain.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the specification where appropriate. In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values.
Examples
Referring to fig. 1, a method for implementing a flow engine based on Spark and parallel memory computation is provided. The method comprises the following steps:
s100, acquiring graphical flow arrangement information, converting the graphical flow arrangement information into an arrangement script, and storing the arrangement script into a Spark platform.
The graphical designer is used for designing graphical flow programming information, and each flow programming comprises at least one flow step. Specifically, the process layout information may include process body information, context information, process step information (may include information of step names, step descriptions, step sequences, etc.), input data stream information, and the like.
The Spark platform is used as a stream processing platform and is used for connecting the big data platform with the information of the data flow and the business flow.
S200, compiling the programming script into a state machine in the form of Java byte codes; and when the preset condition is met, the state machine can restore the operation from the Spark platform to the memory.
The state machine operates by responding to a series of events, each within a control range of a transfer function belonging to a current node, where the range of the function is a subset of the nodes. The function returns to the next/same node. At least one of the nodes is in a final state. When the final state is reached, the state machine stops. As an example of a typical manner, referring to fig. 2, such as when event a occurs, the state machine may transition from "state a" to "state B" and maintain the "state B" state; when the event B occurs, the state machine can be changed from a state A to a state C, and the state C can correspond to an execution action I; when event C occurs, the state machine may transition from "state C" to "state D", which may correspond to performing action two.
In AKKA, the state machine is implemented by an Actor. Preferably, the dynamic state machine in this embodiment is compiled by a compiler through a preset pattern (Schema) similar to the flow definition (ProcessDiagram) in BPM, and then implements the logic (flow layout) in the ProcessDiagram in the form of a state machine. The product compiled by the compiler is Java bytecode (Scala class of AKKA FSM Actor), so that the dynamic state machine can run in an AKKA environment after being instantiated.
The preset conditions can be set by default by the system, or can be set individually by the user according to the actual business processing requirements, and are not limited herein.
S300, controlling and monitoring state change information of the state machine through a state controller, and storing the state change information on the Spark platform.
In this embodiment, the dynamic state machine (or dynamic FSM) is mainly responsible for the translation of ProcessDiagram. Each flow instance requires a state controller to complete branch condition determination, state synchronization, action delivery, and external invocation.
Specifically, the state controller may find a corresponding state holder, such as a context in a flow, data in a data table, or an API of some external system, according to a condition defined in the ProcessDiagram, to make a determination of a branching condition. The data frame (data set, which is a data frame structure and corresponds to a data table) of the Spark platform is preferentially used in implementation, so that the distributed computing and data local computing capabilities of the Spark platform can be fully utilized.
In this embodiment, these state machines and their corresponding state controllers are deployed as a unit of operation into state machine containers, each of which may be a distributed deployment unit in a service system cluster. Each state machine container is provided with a call port that can be invoked by an external system, so that the state machine container can be invoked by the external system.
The data set of the Spark platform may include a repository DateFrame, metadata DateFrame, a data stream DateFrame, and a state DateFrame. The graphical flow layout information designed by the graphical designer is stored in a repository DateFrame. Syntax parsing information of a querier connected to the state controller is stored in metadata DateFrame. The data in the data stream DateFrame is used to create a state machine and a state controller. State management information of the state machine is stored in a state DateFrame.
Considering that the state and context change frequently while the flow is running, these changes can put significant stress on the database if the first time is embodied in the database. In this embodiment, three layers of management data are specifically configured corresponding to the state machine, including FSM memory state data, state DateFrame, and database table.
By adopting three layers of management of FSM memory state, spark data frame (memory mapping of table) and database table, not only is the quick low-delay response ensured, but also the integrity and consistency of data are ensured, and finally, the state machine in the memory can recover from Spark data frame (realize inverse persistence) as required after being finally persisted into the database, so that the retrospective of each state change can be realized.
In implementation, the database responsible for persisting flow states, flow definitions and metadata can preferentially use a large-throughput non-relational database represented by MongoDB HBase, and can be well supported by a Spark platform.
S400, the state controller calls an external system to process business through a call port according to the programming script.
In this step, the state machine may invoke external systems for business processes according to the orchestrated script, which may include conditional queries and execution of actions, which are accomplished by the state controller.
The processing of the single service scenario and the multi service scenario is described in detail below in connection with fig. 3-5.
Referring to fig. 3, first, a page form arrangement business process is acquired, then, graphical process arrangement information in a page is converted into an arrangement script, and the arrangement script is stored in a Spark platform.
Then compiling the programming script into a state machine in the form of Java byte codes, and creating a state machine and a state controller, wherein when the preset condition is met, the state machine can be restored to run in a memory from a Spark platform.
And then, controlling and monitoring state change information of the state machine through a state controller, and storing the state change information on the Spark platform. Specifically, state machine state information and state machine motion information are acquired through a state controller, state management data are generated according to the state machine state information, motion execution data are generated according to the state machine motion information, and the state management data and the motion execution data are stored in a Spark platform.
Finally, the state controller can call the external system to process the service through the call port according to the programming script.
In this embodiment, the state machine and the corresponding state controller are deployed into the state machine container as a distributed deployment unit in the cluster, so that the state machine container can be newly added in a hot deployment manner, thereby supporting the multi-service concurrent execution in fig. 4. Referring to fig. 4, for a multi-service concurrent processing scenario, which includes a plurality of state machine containers, the plurality of state machine containers may be triggered to process by scheduling a triggering state controller, and the operation process of each state machine container refers to fig. 3. By adopting the hot deployment mode, related programs are not required to be restarted when a state machine container is newly added in the system, and the system is only required to be refreshed after modification is completed, so that the system operation efficiency is improved.
Referring to fig. 5, the workflow of the state machine in this embodiment is illustrated, specifically as follows:
and step 1, starting a state machine.
And 2, waiting for triggering a new event, and determining to execute an action and set a next state according to the new event and the current state of the state machine.
And step 3, acquiring the new event and triggering action calling of the event, namely triggering and calling an external system to carry out service processing according to the execution action corresponding to the new event.
And step 4, receiving a return result of the external system, and judging whether the state machine is ended.
And 5, when the state machine is judged not to be ended, returning to the step 2, otherwise, controlling the state machine to be ended.
According to the technical scheme, based on the Spark and AKKA technology which is mature in the big data field, the FSM (finite state machine) in the memory is used as a core to convert the graphical service information into the executable machine language in combination with the dynamic compiling technology, and the constructed distributed system is operated on a Spark big data platform to provide low-delay response based on the event under the high throughput scene. In practice, data or traffic events may enter the engine system via Spark streaming through a high throughput data stream represented by Kafka.
Further, considering that in an internet application scenario, it may be required to face an instantaneous peak flow— for example, when a downstream RPC service throughput is limited or a state query of a branching condition encounters a bottleneck, such pressure needs to be sensed and adjusted at a data source entering the system, and a responsive streaming calculation (or backpressure mechanism) of AKKA Stream may be used in combination with a state of an FSM in a memory, to perform a preset operation on data or an external system, so as to implement flow control on a link.
Furthermore, abundant algorithms and models in Spark MLlib ecology can be used as condition judgment to make predictions and recommendations, and models can be continuously trained, evaluated and optimized in the process of running the flow. The closed loop of data applications accumulated by data embedded in a business system can be well supported thanks to the convenience brought by running on Spark platform.
By way of example and not limitation, the present invention may be applied to micro-service automation testing, data cleansing ETL, and inventory data synchronization. In the micro-service automation test application scenario, the flow main body is a test scenario, the context is used for connecting intermediate values of different steps, the flow steps comprise RPC (remote procedure call) interface call and asynchronous message sending/receiving, and the input data stream comprises asynchronous messages and WebHook information. In the application scene of the data cleaning ETL, a flow main body is a client, the context is used for calculating an intermediate value, the flow steps comprise main body data and metadata complement, redundant data deletion, external data extraction, missing value calculation and data summarization calculation, and an input data stream comprises buried data, transaction data and third party data. In the context of inventory data synchronization application, the process body is a commodity SKU (i.e., an inventory holding unit, which may be a piece, a box, a pallet, etc., and the SKU is a physically indistinct minimum stock unit), the context is an inventory, the process steps include loading, selling and unloading, and the input data stream includes warehousing, ex-warehouse, customer order, customer cancellation order, and customer return data.
The invention also provides a flow engine system based on Spark and parallel memory calculation.
The flow engine system comprises a Spark platform, a flow processing compiling module, a state and context management module and a service execution module.
The Spark platform is used as a stream processing platform for connecting the big data platform with the data flow and the business flow, and is shown in fig. 6.
The stream processing compiling module is used for acquiring graphical flow arranging information, converting the graphical flow arranging information into arranging scripts and storing the arranging scripts into the Spark platform; and a state machine that compiles the orchestration script into a form of Java bytecode.
The state and context management module is used for controlling and monitoring state change information of the state machine through the state controller and storing the state change information on the Spark platform. Specifically, state machine state information and state machine action information are acquired through a state controller, state management data are generated according to the state machine state information, and action execution data are generated according to the state machine action information. The state management data and the action execution data are saved in the Spark platform.
The state and context management module is further configured to: and recovering the state machine from the Spark platform to run in the memory when the preset condition is met.
And the service execution module is used for triggering the corresponding state controller to call the external system to carry out service processing through the call port according to the programming script.
Specifically, the state controller is configured to perform the steps of: step 1, starting a state machine; step 2, waiting for a new event to trigger, and determining to execute an action and set a next state according to the new event and the current state of the state machine; step 3, acquiring a new event and triggering action calling of the event, namely triggering and calling an external system to carry out service processing according to the execution action corresponding to the new event; step 4, receiving a return result of the external system, and judging whether the state machine is finished; and 5, when the state machine is judged not to be ended, returning to the step 2, otherwise, controlling the state machine to be ended.
Referring to FIG. 7, in this embodiment, the dynamic state machine (or dynamic FSM) is mainly responsible for the translation of ProcessDiagram. Each flow instance requires a state controller to complete branch condition determination, state synchronization, action delivery, and external invocation.
Specifically, the state controller may find a corresponding state holder, such as a context in a flow, data in a data table, or an API of some external system, according to a condition defined in the ProcessDiagram, to make a determination of a branching condition. The DataFrame of the Spark platform is preferentially used in implementation, so that the distributed computing and data local computing capabilities of the Spark platform can be fully utilized.
In this embodiment, these state machines and their corresponding state controllers are deployed as a unit of operation into state machine containers, each of which may be a unit of deployment in a service system cluster. Each state machine container is provided with a call port that can be invoked by an external system, so that the state machine container can be invoked by the external system.
With continued reference to fig. 7, the dataset of the Spark platform may include a repository DateFrame, metadata DateFrame, a data stream DateFrame, and a state DateFrame. The graphical flow layout information designed by the graphical designer is stored in a repository DateFrame. Syntax parsing information of a querier connected to the state controller is stored in metadata DateFrame. The data in the data stream DateFrame is used to create a state machine and a state controller. State management information of the state machine is stored in a state DateFrame.
Considering that the state and context change frequently while the flow is running, these changes can put significant stress on the database if the first time is embodied in the database. In this embodiment, preferably, three layers of management data are set corresponding to the state machine, including FSM memory state data, state DateFrame, and database tables. By adopting three layers of management of FSM memory state, spark data frame (memory mapping of table) and database table, not only is the quick low-delay response ensured, but also the integrity and consistency of data are ensured, and finally, the state machine in the memory can recover from Spark data frame (realize inverse persistence) as required.
In implementation, the database responsible for persisting flow states, flow definitions and metadata can preferentially use a large-throughput non-relational database represented by MongoDB HBase, and can be well supported by a Spark platform.
The flow engine system may further include a multi-service concurrency scheduling module in view of multi-service concurrency processing.
The multi-service concurrency scheduling module is used for adding a state machine container in a hot deployment mode when multi-service concurrency processing is carried out, and triggering a plurality of state controllers to trigger the plurality of state machine containers to respectively carry out state control and monitoring.
Other technical features are referred to the previous embodiments and will not be described here again.
In the above description, the disclosure of the present invention is not intended to limit itself to these aspects. Rather, the components may be selectively and operatively combined in any number within the scope of the present disclosure. In addition, terms like "comprising," "including," and "having" should be construed by default as inclusive or open-ended, rather than exclusive or closed-ended, unless expressly defined to the contrary. All technical, scientific, or other terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Common terms found in dictionaries should not be too idealized or too unrealistically interpreted in the context of the relevant technical document unless the present disclosure explicitly defines them as such. Any alterations and modifications of the present invention, which are made by those of ordinary skill in the art based on the above disclosure, are intended to be within the scope of the appended claims.

Claims (6)

1. A method for realizing a flow engine based on Spark and parallel memory computation is characterized by comprising the following steps:
the method comprises the steps of obtaining graphical flow arranging information, converting the graphical flow arranging information into arranging scripts, and storing the arranging scripts into a Spark platform;
compiling the programming script into a state machine in the form of Java byte codes; when the preset conditions are met, the state machine can restore the state machine from the Spark platform to the memory to operate;
the state change information of the state machine is controlled and monitored through a state controller, and the state change information is stored on the Spark platform;
the state controller calls an external system through a call port to process business according to the arrangement script;
the state machine and the corresponding state controllers are deployed into state machine containers as an operation unit, and each state machine container is used as a distributed deployment unit in the cluster; when multi-service concurrent processing is performed, a state machine container is newly added in a hot deployment mode, and a plurality of state machine containers are triggered to perform state control and monitoring respectively by scheduling a triggering state controller;
the data set of the Spark platform comprises a warehouse DateFrame, metadata DateFrame, a data stream DateFrame and a state DateFrame; the method comprises the steps that graphical flow layout information designed through a graphical designer is stored in a warehouse Dateframe, grammar analysis information of an inquirer connected with a state controller is stored in a metadata Dateframe, data in a data stream Dateframe are used for creating a state machine and the state controller, and state management information of the state machine is stored in a state Dateframe; three layers of management data are arranged corresponding to the state machine, wherein the three layers of management data comprise FSM memory state data, state DateFrame and database tables, so that the state machine can recover to run in a memory from the Spark platform, and reverse persistence is realized.
2. The method according to claim 1, characterized in that: the state machine container is provided with a call port that can be called by an external system.
3. The method according to any one of claims 1-2, characterized in that: the state controller acquires state information and state machine motion information, generates state management data according to the state information of the state machine, generates motion execution data according to the state machine motion information, and stores the state management data and the motion execution data in the Spark platform.
4. A method according to claim 3, characterized in that: the state controller is configured to perform the steps of,
step 1, starting a state machine;
step 2, waiting for a new event to trigger, and determining to execute an action and set a next state according to the new event and the current state of the state machine;
step 3, acquiring a new event and triggering action calling of the event, namely triggering and calling an external system to carry out service processing according to the execution action corresponding to the new event;
step 4, receiving a return result of the external system, and judging whether the state machine is finished;
and 5, when the state machine is judged not to be ended, returning to the step 2, otherwise, controlling the state machine to be ended.
5. A flow engine system based on Spark and parallel memory computation is characterized by comprising:
a Spark platform;
the stream processing compiling module is used for acquiring graphical flow arranging information, converting the graphical flow arranging information into arranging scripts, storing the arranging scripts into the Spark platform, and compiling the arranging scripts into a state machine in a Java byte code form;
the state and context management module is used for controlling and monitoring state change information of the state machine through the state controller and storing the state change information on the Spark platform;
the service execution module is used for triggering the corresponding state controller to call the external system to carry out service processing through the call port according to the programming script;
the state and context management module is configured to restore the state machine from the Spark platform to the memory for operation when the state and context management module accords with preset conditions; the data set of the Spark platform comprises a warehouse DateFrame, metadata DateFrame, a data stream DateFrame and a state DateFrame; the method comprises the steps that graphical flow layout information designed through a graphical designer is stored in a warehouse Dateframe, grammar analysis information of an inquirer connected with a state controller is stored in a metadata Dateframe, data in a data stream Dateframe are used for creating a state machine and the state controller, and state management information of the state machine is stored in a state Dateframe; three layers of management data are arranged corresponding to the state machine, wherein the three layers of management data comprise FSM memory state data, state DateFrame and database tables, so that the state machine can recover to run in a memory from a Spark platform, and inverse persistence is realized;
the state machine and the corresponding state controllers are deployed into state machine containers as an operation unit, and each state machine container is used as a distributed deployment unit in the cluster;
the multi-service concurrent scheduling system further comprises a multi-service concurrent scheduling module, wherein the multi-service concurrent scheduling module is used for adding a state machine container in a hot deployment mode when multi-service concurrent processing is carried out, and scheduling and triggering a plurality of state controllers to trigger the plurality of state machine containers to respectively carry out state control and monitoring.
6. The flow engine system of claim 5, wherein: acquiring state information and state machine action information through a state controller, generating state management data according to the state information of the state machine, generating action execution data according to the state machine action information, and storing the state management data and the action execution data into a Spark platform;
the state controller is configured to perform the steps of:
step 1, starting a state machine;
step 2, waiting for a new event to trigger, and determining to execute an action and set a next state according to the new event and the current state of the state machine;
step 3, acquiring a new event and triggering action calling of the event, namely triggering and calling an external system to carry out service processing according to the execution action corresponding to the new event;
step 4, receiving a return result of the external system, and judging whether the state machine is finished;
and 5, when the state machine is judged not to be ended, returning to the step 2, otherwise, controlling the state machine to be ended.
CN202011267074.3A 2020-11-13 2020-11-13 Method and system for realizing flow engine based on Spark and parallel memory calculation Active CN112379884B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011267074.3A CN112379884B (en) 2020-11-13 2020-11-13 Method and system for realizing flow engine based on Spark and parallel memory calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011267074.3A CN112379884B (en) 2020-11-13 2020-11-13 Method and system for realizing flow engine based on Spark and parallel memory calculation

Publications (2)

Publication Number Publication Date
CN112379884A CN112379884A (en) 2021-02-19
CN112379884B true CN112379884B (en) 2024-01-12

Family

ID=74583826

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011267074.3A Active CN112379884B (en) 2020-11-13 2020-11-13 Method and system for realizing flow engine based on Spark and parallel memory calculation

Country Status (1)

Country Link
CN (1) CN112379884B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112486468A (en) * 2020-12-15 2021-03-12 恩亿科(北京)数据科技有限公司 Spark kernel-based task execution method and system and computer equipment
CN113553533A (en) * 2021-06-10 2021-10-26 国网安徽省电力有限公司 Index calculation method based on digital internal five-level market assessment system
CN116400987B (en) * 2023-06-06 2023-08-18 智者四海(北京)技术有限公司 Continuous integration method, device, electronic equipment and storage medium
CN116755804B (en) * 2023-07-03 2024-04-26 红有软件股份有限公司 Assembled integrated big data processing method and system
CN116795378B (en) * 2023-08-18 2023-11-21 宁波数益工联科技有限公司 Method and device for arranging and executing process based on code dynamic compiling

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108432208A (en) * 2016-12-15 2018-08-21 华为技术有限公司 A kind of arranging service method, apparatus and server
CN110580155A (en) * 2019-07-31 2019-12-17 苏宁云计算有限公司 Implementation method and device of state machine engine, computer equipment and storage medium
CN111142867A (en) * 2019-12-31 2020-05-12 谷云科技(广州)有限责任公司 Service visual arrangement system and method under micro-service architecture

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9384020B2 (en) * 2013-01-18 2016-07-05 Unisys Corporation Domain scripting language framework for service and system integration

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108432208A (en) * 2016-12-15 2018-08-21 华为技术有限公司 A kind of arranging service method, apparatus and server
CN110580155A (en) * 2019-07-31 2019-12-17 苏宁云计算有限公司 Implementation method and device of state machine engine, computer equipment and storage medium
CN111142867A (en) * 2019-12-31 2020-05-12 谷云科技(广州)有限责任公司 Service visual arrangement system and method under micro-service architecture

Also Published As

Publication number Publication date
CN112379884A (en) 2021-02-19

Similar Documents

Publication Publication Date Title
CN112379884B (en) Method and system for realizing flow engine based on Spark and parallel memory calculation
US8209710B2 (en) Implementation system for business applications
US20180253335A1 (en) Maintaining throughput of a stream processing framework while increasing processing load
CN110908641B (en) Visualization-based stream computing platform, method, device and storage medium
US20170083396A1 (en) Recovery strategy for a stream processing system
Guller Big data analytics with Spark: A practitioner's guide to using Spark for large scale data analysis
CN103441900A (en) Centralization cross-platform automated testing system and control method thereof
CN107103064B (en) Data statistical method and device
US11314808B2 (en) Hybrid flows containing a continous flow
CN102375731A (en) Coding-free integrated application platform system
WO2019047441A1 (en) Communication optimization method and system
US11074079B2 (en) Event handling instruction processing
Ali et al. Recent trends in distributed online stream processing platform for big data: Survey
CN107480202B (en) Data processing method and device for multiple parallel processing frameworks
CN102129461A (en) Method for quickly retrieving enterprise data
Sakr Big data processing stacks
CN110516000B (en) Workflow management system supporting complex workflow structure
WO2016168216A1 (en) Checkpointing higher order query operators
CN103150164B (en) The quick SOWF framework method that a kind of transaction rules drives
Surekha et al. Real time streaming data storage and processing using storm and analytics with Hive
Bockermann A survey of the stream processing landscape
US11061387B2 (en) Dispatcher system for semiconductor manufacturing plant
Sakr et al. Big data processing systems: state-of-the-art and open challenges
US20090150907A1 (en) Mapping between disparate data models via anonymous functions
Guerreiro et al. A digital twin for intra-logistics process planning for the automotive sector supported by big data analytics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant