CN115357413A

CN115357413A - Control system for horizontally expanding rule engine instance

Info

Publication number: CN115357413A
Application number: CN202211055432.3A
Authority: CN
Inventors: 马奥博; 成少飞; 王帅
Original assignee: Shenyang Shurong Technology Co ltd
Current assignee: Shenyang Shurong Technology Co ltd
Priority date: 2022-08-31
Filing date: 2022-08-31
Publication date: 2022-11-18

Abstract

The invention discloses a control system for horizontally expanding a rule engine instance, which comprises an upstream event source, a downstream service system, an event sequence bus group, a time synchronization module, a rule engine cluster and a rule output bus group, and has the advantages that firstly, the error condition caused by downtime in single-point deployment and cluster expansion is checked by using a message transmission mode based on a request/response; 2. by using a transmission control mechanism based on rotation, a plurality of overlapped service loads contained in an event sequence are dispersed into a plurality of nodes in a cluster; 3. by using a rolling mechanism based on a time window, the error correction of processing result errors caused by other exceptions is realized; 4. three different schemes are provided, and different requirements on high performance, balance and strong consistency in the rule engine cluster expansion process are met respectively.

Description

Control system for horizontally expanding rule engine instance

Technical Field

The invention relates to the field of rule engine horizontal extension, in particular to a control system for enabling a rule engine instance to horizontally extend.

Background

For a complete service, the general common practice of the current conventional horizontal extension scheme is to "force stateless", that is: the complete service can be divided into several stages which can be cut, each stage is deployed in a horizontal expansion mode, when a new service flow starts, different service stages are called in sequence, and after the processing of each stage is completed, a form datum is serialized and processed by the next stage. But the key point of this solution is to find service data that can be serialized, and the modification of the service data in each service phase does not depend on the processing results of different service phases of other similar services.

In a typical business scenario of a rules engine, this is not possible, for example: after the temperature reported by the temperature sensor (T1) at the site A is too high (the reported temperature is T1), 2 or more smoke alarms in 3 smoke alarms (respectively marked as S1, S2 and S3) at the site A report that smoke is found, and then within 1 minute, the temperature T2 reported again by the T1 is higher than T1, so that the fire disaster at the site A is considered to occur. At this time, if T1 reports T2, if T1 reported by T1 is not acquired or more than 2 of S1, S2 and S3 are not acquired, correct processing cannot be performed, so that the service processing capability and the processing difficulty of the platform are greatly improved.

Disclosure of Invention

The invention aims to solve the problems and designs a control system which enables a rule engine instance to be horizontally expanded.

The technical scheme of the invention is that the control system for horizontally expanding the rule engine instance comprises an upstream event source, a downstream service system, an event sequence bus group, a time synchronization module, a rule engine cluster and a rule output bus group;

the upstream event source refers to any business system which obtains the business processing result together with the limiting condition corresponding to the processing result, and provides business processing events for all the businesses of the rule engine system;

the downstream business system is a system for performing next business processing by using a rule engine processing result;

and the time synchronization module synchronizes local time of all buses, modules and instances to ensure consistent time in the system.

The event sequence bus group is composed of a plurality of event sequence buses and is jointly responsible for providing event sequences for each node or node group in an instance cluster in the rule engine cluster, and each event sequence bus is composed of an event receiving module, an event message coding module, a rule configuration and analysis module, a dispersed load module (the dispersed load module comprises a plurality of dispersed load instances) and an event sending module.

The rule engine cluster comprises an instance encoding module and an instance cluster, wherein the instance cluster can select two strategies: when the mutual active-standby strategy is selected, the example cluster comprises a plurality of rule engine nodes; when an intra-group redundancy policy is selected, multiple rule engine node groups are included within the instance cluster.

The rule output bus group consists of a plurality of rule output buses and is jointly responsible for carrying out de-coincidence and sending on processing results obtained by the rule engine cluster. Each rule output bus comprises a repeated trigger merging module and a processing result sending module. The repeated triggering and combining module comprises a plurality of deduplication tasks, and each task is responsible for deduplication of one rule. And the processing result sending module is responsible for sending the processing result after de-duplication and combination to a downstream service system.

The cluster deployment strategy of the main and standby clusters comprises the following steps: all the rule engine instances form a mutual primary-backup relationship, namely if one instance fails, another instance in the cluster takes over the failed instance to perform business processing once.

The cluster deployment strategy of the intra-group redundancy is as follows: all the instances are logically divided into a plurality of groups, the number of the instances in each group is odd, all the instances in each group execute the same service, if a certain node fails, other nodes in the group cannot be influenced, and finally, the processing result of one service is based on the principle that most nodes of the nodes in the group reach the same result.

When the rule engine cluster selects the mutual active-standby strategy, each instance is a node in the cluster, and when the redundancy strategy in the group is selected, a plurality of instances form a node group in the cluster together.

The example coding module is responsible for coding the examples in the rule engine example cluster, and represents a certain example, the current working state of the example, the service processing condition and the processing result through the coding in the system.

According to the control system for horizontally expanding the rule engine instance, which is manufactured by the technical scheme of the invention, firstly, the condition that errors are caused by downtime in single-point deployment and cluster expansion is checked by using a message transmission mode based on request/response; 2. by using a transmission control mechanism based on rotation, a plurality of overlapped service loads contained in an event sequence are dispersed into a plurality of nodes in a cluster; 3. by using a rolling mechanism based on a time window, the error correction of processing result errors caused by other exceptions is realized; 4. three different schemes are provided, and different requirements on high performance, balance and strong consistency in the rule engine cluster expansion process are met respectively.

Drawings

FIG. 1 is a system overall block diagram of a control system for horizontally expanding an example of a rules engine according to the present invention;

FIG. 2 is a diagram of a rule engine cluster architecture for a control system that enables horizontal expansion of rule engine instances in accordance with the present invention.

FIG. 3 is a diagram of the structure and flow of an event sequence bus of a control system for horizontally expanding an instance of a rules engine according to the present invention.

FIG. 4 is a block diagram illustrating the structure and flow of an event receiver module of a control system for horizontally expanding an instance of a rules engine according to the present invention.

FIG. 5 is a block diagram illustrating the structure and flow of the event message encoding modules of the control system for horizontally expanding an instance of a rules engine according to the present invention.

FIG. 6 is a block diagram illustrating the structure and flow of a rule configuration and analysis module of a control system for horizontally expanding an instance of a rule engine according to the present invention.

FIG. 7 is a block diagram illustrating the structure and flow of a rule configuration and analysis module of a control system for horizontally expanding an instance of a rule engine according to the present invention.

FIG. 8 is a block diagram illustrating the structure and flow of an event delivery module of a control system for horizontally expanding an instance of a rules engine according to the present invention.

FIG. 9 is a block diagram illustrating the structure and flow of an event delivery module of a control system for horizontally expanding an instance of a rules engine according to the present invention.

Detailed Description

The present invention will be described in detail with reference to the accompanying drawings, and as shown in fig. 1 to 9, a control system for horizontally expanding a rule engine instance receives an event from an upstream event source, processes the event in the event, and sends the processed result to a downstream business system after the processing is finished. Where the upstream event source is the system that provides events for business processing for all of the business of the rules engine system. And the downstream business system is a system for performing the next business processing by using the rule engine processing result.

The interior of the whole control system is divided into 4 parts of time synchronization, an event sequence bus group, a rule engine cluster and a rule output bus group. Wherein:

the event sequence bus group consists of a plurality of event sequence buses, and each event sequence bus consists of an event receiving module, an event message coding module, a rule configuration and analysis module, a dispersed load module (the dispersed load module comprises a plurality of dispersed load examples) and an event sending module.

The rule output bus group consists of a plurality of rule output buses, each rule output bus comprises a repeated triggering and merging module and a processing result sending module, and the repeated triggering and merging module comprises a plurality of deduplication tasks.

The main innovation of the invention is the two parts of the event sequence bus group and the regular output bus group. The event sequence bus is responsible for arranging and controlling event sequences, sending event information to each instance in the rule engine cluster through messages, and processing the condition that the rule engine instance fails in the sending stage. And after receiving the output message, the rule output bus carries out sorting and merging, checks the conditions of false triggering and processing failure, and finally informs a downstream service system of a processing result. The two are matched with the rule engine cluster, and under the support of the time synchronization module, the dispersion of the service load under the condition that the rule engine processes the stateful computation can be realized.

The overall usage flow of the system in this embodiment, and the specific roles of the internal modules in this flow, and the flow relationships between the modules will now be described.

First, the deployment job is to deploy the required individual components. Some rule engine instances are deployed, an "instance coding module" records information of the instances, and after deployment is completed, a cluster strategy (mutual active-standby strategy and intra-group redundancy strategy) is selected. According to different selected cluster strategies, the instances are formed into nodes or node groups, and the nodes or the node groups are coded and stored by an instance coding module. Then, a plurality of event sequence buses are deployed to form an event sequence bus group, and a plurality of rule output buses are deployed to form a rule output bus group.

The second is the configuration work. An event (i.e. data which can be provided by an upstream event source and contains specific service information) is configured in an event receiving module of an event sequence bus by a user, and the event specifies the type, data field and receiving mode (active receiving or passive receiving) information to which the event belongs. After the user configures the event, the rule requiring processing is configured in the "rule analysis and configuration module" (the rule is that in the event sequence, whether an event combination meeting the condition exists or not is checked according to the configuration in the rule), then the rule is analyzed to obtain the service template to be deployed on each rule engine node or node group, the service templates to be deployed on different nodes or node groups are different, and the templates are deployed on the corresponding rule engine nodes or node groups.

The processing work of the event sequence is executed by an event sequence bus, and the event sequence is provided for the rule engine cluster. When the event and the rule are configured, the 'event receiving module' starts a task to receive the event, converts the event into an original event after receiving the event, and sends the original event to the 'event message coding module'. After receiving the original event, the event message coding module adds a sequential and unique code to the original event, sorts all the original events according to the code to obtain an original event sequence, and sends the original event sequence to the dispersed load module. The "distributed load module" internally initiates a plurality of "distributed load instances" respectively responsible for collecting event sequences for different rules. The collected event sequence is sent to the rule engine cluster by an event sending module.

The processing of the rules then takes place, executed by a node or group of nodes in the "rules Engine Cluster", to provide the "rule output bus" with the output of the rules prior to deduplication (i.e., the processing results of the rules). Each rule engine node or node group respectively processes the service template deployed for the rule configuration and analysis module, and after the processing is finished, the rule output module sends the rule to a rule output bus.

Finally, the de-duplication and combination work of the rule output is executed by a rule output bus, and the de-duplication and combination result is sent to a downstream business system. The repeated triggering and combining module starts a plurality of duplication-removing tasks, each task corresponds to a rule, and after successful duplication-removing and combining outputs a rule, the result is converted into a result and is sent to the processing result sending module and a downstream business system.

A time synchronization module: the method is used for guaranteeing the consistency of system time used by each process in the system, and the correctness and the effectiveness of service processing can be guaranteed only when the time is consistent. The time synchronization module guarantees the time synchronization level to be above millisecond level. The specific implementation of this module can be done using ntp or chrony services, and time synchronization of the host or service processes can be selected.

The rule engine cluster is as follows: and the method is responsible for actual business processing, namely finding out qualified event combinations from the event sequence according to the rule configuration. Fig. 2 is a structural diagram of a rule engine cluster, where the left and right parts respectively show the composition and interaction of each part in the rule engine cluster under the mutually active-standby policy and the intra-group redundancy policy, and the diagram is a detailed expanded view of the "rule engine cluster" part in fig. 1. The "rule engine instance cluster" is composed of a plurality of rule engine instances (each instance is a complex event processing middleware, the content in the text is explained based on Siddhi, and other products can be selected, and the correctness of the system in horizontal extension is not influenced), and the "instance coding module" is responsible for coding the instances.

Rule Engine instance Cluster: each instance receives the message sent by the 'event sequence bus', identifies the events in the event sequence according to the specific configuration of the rule, and sends the successfully identified event combination to the 'rule output bus' in an output form. Each instance in the cluster has no state, the deployment work is manually carried out by operation and maintenance personnel, and the address of each instance is stored in an instance deployment information table. The example of the rule engine cluster includes two different deployment strategies, namely a mutually active-standby strategy and a redundancy strategy in the group, and the mutually active-standby cluster deployment strategy, namely the mutually active-standby strategy, has the advantages of relatively less required resources, short response time, low error correction speed and suitability for projects with relatively small scale. The cluster deployment strategy of the intra-group redundancy, referred to as the intra-group redundancy strategy for short, has the advantages of short response time, high error correction speed and the defects of more required resources and is suitable for projects with larger scale.

Example encoding module: it is responsible for encoding the instances in the cluster of rule engine instances, with this encoding representing an instance in the system. And reading the information of each instance from the instance deployment information table, assigning a fixed number to the information and storing the information into the instance deployment information table. When using intra-group redundancy policies, it is also necessary to assign and save to which group this instance belongs, each group having a different number, and handling exactly the same traffic within the same group.

Event sequence bus: fig. 3 is a structure and a flow chart of the event sequence bus, namely a detailed development of the event sequence bus in fig. 1, illustrating the flow in the event sequence bus. The original events obtained from various event sources are required to be uniformly received by an event receiving module. And according to the actually received sequence, all original events are sequenced and coded by an event message coding module to form a determined event sequence. After forming the determined event sequence, the rule configuration and analysis module analyzes whether a rule uses the event, if so, the rule is retrieved and sequenced to form a stable event sequence aiming at the rule. Then, the 'distributed load module' determines the main processing node or node group of each event in a rule according to a sequential rotation mode, and the selected node or node group processes the event preferentially. After determining the processing nodes or node groups, the event sending module is used for converting the event into a message and sending the message to each node or node group. Furthermore, the event sequence bus itself is also horizontally extended, and a bus number needs to be assigned to each event sequence bus.

An event receiving module: currently, there are two ways for an event receiving module to receive an event, namely active receiving and passive receiving. Wherein: active reception means that a plurality of channels are assigned to the event receiving module, the channels are required by the service of the event receiving module, the channels are irrelevant to the service of the rule engine, the event receiving module can actively consume the information in the channels, useful information is collected as an original event, and the query of the information in the mainstream message middleware products in the market is supported at present. The passive receiving means that an upstream event source actively sends the message to an event receiving module in a mode designated by a rule engine, and the mode has two specific implementations, namely, the event receiving module assigns a plurality of topics in a message middleware through configuration, and the event receiving module assigns an API port, and an upstream service system sends the message content to the event receiving module through one of the two modes, so that the event receiving module passively receives the message pushing.

Fig. 4 is a structure and a flow chart of the event receiving module, namely, a detailed development of the event receiving module in fig. 3, which describes a flow of the event receiving module. Firstly, the configuration work of the event is edited on an event configuration page by a user, and the configured event is saved in a database through an event type configuration function. After the storage is finished, a task of receiving event function is started for each event, and the task is responsible for receiving the events. The received event is converted into an original event that can be used inside the system using an "original event conversion function". After the conversion is completed, the conversion is sent to the event message coding module by the original event sending function. The following describes these functions separately

Event type configuration function: and is responsible for configuring various types of events that the rules engine may process or match. The function comprises the steps of adding, deleting, modifying and checking the event classification, including id, classification name and sequence number, and storing into an event classification table; and (4) performing addition, deletion, modification and check on each event, wherein the events comprise id, event name, receiving mode, channel, original structure of the event and field list of event output, and storing the fields in an event information table. After the event creation is completed, the message is consumed or an HTTP request is received in a designated reception manner.

The receive event function: is responsible for receiving various events generated by various upstream event sources, wherein:

and when actively receiving the data required to be consumed in the topic, using a consumption group different from the original service. After the event creation is completed, a new consumption group is used, and the name of the consumption group is stored in the event information of the database (when the task is restarted, the stored consumption group is used for consumption). Each event sequence bus then opens a thread (or thread pool) to consume the messages therein.

When the passive receiving function uses the message middleware, the flow and the mode of the passive receiving function are the same as those of the active receiving function. When receiving using the API port, the channel is an HTTP interface address, generated and assigned by the system. The access to the interfaces of these addresses will all arrive in the API event message receiving interface, and the back end of this interface will maintain a real-time mirror image of the event information table in the database (the synchronization is ensured by periodically refreshing the whole table). After receiving the request, the interface will determine which event is according to the matching result according to the record in the request address matching table.

Original event conversion function: the system is responsible for identifying and converting the original message of the event received by the event receiving function, determining which event the event belongs to according to the configuration of an event information table in a database, and converting according to the following method:

(1) Taking out the original structure information of the event to obtain an event message format;

(2) Checking whether the received original message conforms to the configuration of the original structure field of the event, if not, abandoning the message and ending, and if so, carrying out the next step;

(3) And (4) extracting the information of the event output field list, extracting the corresponding field name and field value in the message, putting the field name and field value into a new object, adding an event id for the object, and completing splicing.

Original event sending function: the original event is sent to an event message encoding module. A local queue is created into which the objects generated by the primitive event transformation function are placed.

An event message encoding module: and the stability and consistency of the event sequence are guaranteed. In a distributed system, the actual occurrence sequence of requests is not determinable, as long as all nodes execute the requests in the same sequence, when an event sequence bus receives messages, the sequence requirements and checks are not made, as long as no loss is ensured, and when messages are sent, the sequence of events in all event sequences must be ensured to be consistent. Each event message is added with a globally unique and sequential number, the number is used for sequencing any two events, and the number is obtained by using a snowflake algorithm. The code is divided into three parts, the first part being the millisecond time stamp of the current time, the second part being the serial number, ranging from 000001 to 999999, and the third part being the node number of the current event sequence bus (e.g. from 01 to 99), eventually generating a large integer, which is added to the serialId field in the event message.

Fig. 5 is a detailed structure and flow of each event message encoding module, i.e., a detailed development of the event message encoding module in fig. 3. The serialId field is added to the original event by each event message coding module, and the processed event message is sent to different topics in the Kafka message middleware. These topics are all consumed by a globally unique node that uses the patroller tree algorithm to sort all event messages from one of all event sequence bus instances, and after completion, send them to topics named sortedEventSeries in kafka, leaving the decentralized load module to consume from them.

The rule configuration and analysis module: the module mainly has four functions, namely a rule configuration function, which configures rules needing to be processed in the whole system; after the configuration is completed, the rule analysis function automatically analyzes several types of events required by the rule, and the events of which types are in the initial positions; the 'receiving configuration function' generates different event receiving configurations for different processing nodes in the rule engine cluster; a "deployment offload function" is a function that provides support for the deployment and offload of rules. Fig. 6 is a structure and a flow of the rule configuration and analysis module, that is, a detailed development of the rule configuration and analysis module in fig. 3, and each function thereof is described in detail below.

The rule configuration function: configuring the information of the rule on the page, wherein the information comprises the name, the description, the configuration of the event sequence, the configuration of an output field, whether the matching time length with the event sequence occurs continuously or not (can be null), and generating an ID for the rule. And storing the final information in a rule information table of the database.

A rule analysis function: the event sequence configuration of the rules stored in the rule information table in the database is taken out, the types of events which are needed to be used by the rules are analyzed from the event sequence configuration, the types of events are positioned at the initial positions, and the analyzed information is used for receiving the configuration function.

Receiving a configuration function: and the system is responsible for generating corresponding business processing templates according to the configuration of the event sequence and the configuration of the output field in the rule, wherein the templates are specific business requirements deployed on the rule engine instance. Firstly, analyzing functions according to the rule in the last step to obtain several types of events at the initial position, which are called initial events; then, there are 2 different behaviors according to different example deployment strategies selected by the rule engine cluster, which is described in the following section specifically; and finally, generating different templates for different nodes in the rule engine cluster, and delivering the generated service processing template to a deployment unloading function for deployment.

When the strategy of mutually being a master-slave mode is used: and adding a direct filtering condition for the initial event, wherein the content is that the value of the processNodeCode field of the event message is the node number. Two fields are added to the output field of the rule, one is processNode, whose value is the number of the current node, and the other is maxserilId, whose value is the largest one of the values of the serialId fields of all events (actually all starting events) involved in the starting event unit, i.e., maxserialId = MAX (serialId). While when using a policy for intra-group redundancy: the above processNodeCode is replaced by processGroupCode, in which case the templates are the same for each instance in the same node group.

Deploying an uninstalling function: and the system is responsible for deploying and unloading the service processing template. When a user sends a deployment instruction, the template obtained by the last function is used for sequentially sending the instruction for deploying the corresponding template for the nodes in the rule engine cluster, and after deployment is finished, node information of successful deployment is stored in the database to prompt the user that deployment is finished. When the user sends an unloading instruction, the user sends an instruction for unloading the corresponding template to the corresponding node according to the record in the database, and the information of successful deployment in the database is deleted after unloading is finished, so that the user is prompted to finish unloading.

A dispersed load module: and the system is responsible for distributing the traffic load processed by the rules to each rule engine node or node group. The configuration of the event sequence in the rule may be regarded as being composed of a plurality of event units, each event unit receives a plurality of types of events, and has a starting state, an ending state, and a transition condition (which may be null) from the starting state to the ending state, and if an event unit is located at the head of the configuration of the event sequence, the starting state may be null. The end state of the former one is the same as the start state of the latter one in two adjacent event units in the sequence. The state transition condition of the event unit may be selected to filter the event information directly by the font size (direct filtering condition) or to filter the event information actually obtained in the event unit located in front of it (reference filtering condition). For example, after the event sequence requires that the class a sensor reports a reading (the first event unit is also the starting unit, and a is the starting event), the class b sensor also reports a reading (the second event unit), and the reading reported by the class b sensor is required to be less than 100 (the direct filtering condition) and greater than the reading reported by the class a sensor (the reference filtering condition). In addition, the rule also has the configuration of whether the event occurs continuously (i.e. whether some other events which do not conform to the configuration of the event sequence can be interpenetrated between two event units) and the matching duration of the event sequence (i.e. how long the event sequence matching can exist since the first event unit reaches the end condition).

There are various types of event units in the event sequence, and the event units that can be used as starting units are respectively described as follows:

a unitary event unit: and when an event meeting the transfer condition occurs, the matching is successful.

A conjunction binary event unit: the unit matches successfully when both events that satisfy the respective branch condition occur.

Extracting a binary event unit: when any one of two events meeting the respective transfer condition occurs, the unit is matched successfully.

Non-event units, there are 5 variants:

common non-event elements: within a specified time period, events that meet the transfer condition do not occur.

Non-event append and event Unit: on the basis of meeting the common non-event unit, the additional event meeting the transfer condition also occurs once, and the unit is matched successfully.

Non-event append or event unit: when the requirement of a common non-event unit is met or an additional event meeting the transfer condition occurs once, the unit is matched successfully.

Non-event append and non-event units: and if the requirement of the common non-event unit is met and the additional event meeting the transfer condition does not occur in the corresponding time period, the matching is successful.

Non-event appending or non-event elements: and if the requirement of the common non-event unit is met or the additional event meeting the transfer condition does not occur in the corresponding time period, the matching is successful.

The business processed by the rule engine system is a rule, namely, events in the infinite event sequence are matched from each position in the event sequence, and if the requirement of the rule is met, namely the matching is successful, the rule is triggered. And if different nodes are enabled to respectively start the matching work of the events from different positions, the distributed load is completed. The core idea of the decentralized load module is that service processing is performed by assigning different nodes (or node groups, the same below) to match starting from different start events in the event sequence by the mechanism described in the following sections.

The distributed load module consists of a plurality of distributed load examples, and one distributed load example corresponds to a rule of a state in deployment in the same process. The distributed load module is responsible for maintaining a consumer, the distributed load modules on different event sequence buses use the same consumption group to consume event sequences finished by the sequencing of the event message coding module, and convert consumed messages into memory objects and forward the memory objects to each distributed load instance in the same process, and the distributed load instances are responsible for distributed load. The following describes the mechanism used in the above method, which is divided into multiple cases according to the starting unit, the cluster deployment strategy, the different matching durations, and the processing requirements.

Mechanisms 1-4 are specific mechanisms when a common univariate event unit or disjunctive bivariate event unit is used as the starting unit.

The mechanism 1, under the mutual master-slave strategy, takes a common unary event unit as an initial unit:

(1) And obtaining an event type list L required by the rule and obtaining a starting event unit U.

(2) The method comprises the steps of obtaining the number n of rule engine nodes in a rule engine cluster and the number of each node, initializing an annular linked list R, wherein the initial length of the annular linked list R is n, the stored content in the linked list is rule engine node information, and the information comprises the codes c of the nodes, the failure times e (the initial value is 0) and the latest failure time t (the initial value is null).

(3) Two pointers p1 and p2 are prepared to point to elements in the circular linked list, the element pointed by p1 is a rule engine node which should perform event sequence matching from the current starting event, the element pointed by p2 is a rule engine node which actually performs event sequence matching from the last starting event, and the initial state points to the first element in the circular linked list.

(4) A variable T is initialized, which represents the next time there may be a rule engine instance node that is to resume normal function, and is initially null. A queue Q is additionally required to store failed rule engine instance node information

(5) Starting a loop, each decentralized load instance receives an event message e from the decentralized load module consumer, if the type of the event e is not in the event type list L, the message is discarded and the step is repeated, otherwise the next step is entered.

(6) And if the type of e is consistent with the event type required in the initial event unit U, entering the next step, and otherwise, entering the step (13).

(7) Adding a field processNodeCode to the event message, wherein the value of the field is the node code c currently pointed by the pointer p1, calling a message sending function (specifying the required response) in the event sending module, sending the message to each rule engine instance in the deployment, and starting the response of the instance to be processed.

(8) After the rule engine instance receives the message, if the processNodeCode value in the message is found to be consistent with the value appointed in the currently deployed service processing template, the rule engine instance immediately reports a response to the response receiving function in the event sending module, wherein the response comprises the serialId in the message. Whether sent or not, the rule engine instance continues processing according to its rule processing logic. (actually, only if the node number is equal to the median value of the message, the node number is regarded as an initial event, and if the node number is unequal, the node number is regarded as a non-initial event)

(9) Because each rule engine instance has different numbers and the sent processNodeCode value is the same as one of the numbers, the response receiving function only receives one response, but if the current rule engine instance is down or too busy, the response cannot be received or cannot be received on time; therefore, the distributed load instance calls a response receiving function of the event sending module to obtain a corresponding serial id response, when the duration of the response is w (w takes an RTT (Round Trip Time) which is 2 times as long as w, here, according to the actual application, it is set to 2ms, which is more appropriate), if the response is received, the step (10) is entered, otherwise, the pointer p1 is moved backward by one bit and the step (7) is repeated.

(10) Checking whether the pointing direction of the pointer p2 is consistent with the pointer p1, and if so, entering the next step; otherwise, adding 1 to the current pointing failure time e of the pointer p2, if the failure time e reaches f times (according to experience, f is more reasonable in value of 3), setting the failure time t as the current time, temporarily removing the current time from the annular linked list, and adding the removed elements into a queue Q; if the current global variable T is not empty, adding a Time interval g (the value of g is slightly larger than MTTR (Mean Time To repeater), assigning the result To T according To the fact that the MTTR is 30 seconds and the value of g is set To 40 seconds), moving the point of the pointer p2 backwards by one bit, and repeating the step.

(11) If the current T is not empty and the current time is greater than T, taking out an element i1 from the queue Q, reading a head element i2 after taking out, setting the failure times e of the i1 to be 0, clearing the failure time T, and inserting the example into a linked list after the pointer p1 points to the position; and if i2 is not empty, adding g to the failure time T of the I2, assigning the T to the T, and if i2 is empty, clearing the value of the T.

(12) And (5) moving the pointer p1 backward by one bit, and entering the step (5).

(13) And (5) calling a message sending function in the event sending module, sending the message of the event to all the rule engine processing nodes, and entering the step (5).

In the mechanism 2, a common unary event unit is used as an initial unit under the intra-group redundancy strategy:

(1) The same as the step (1) of the mechanism 1.

(2) The number m of the rule engine node groups in the rule engine cluster and the number of each node group are obtained, an array A is initialized, the initial length of the array A is m, and the stored content in the array is a rule engine node group code c.

(3) A pointer p is prepared to point to the element in the array A, the element pointed to by p is the rule engine node group which should carry out event sequence matching from the current starting event, and the pointer p points to the first element in the array A in the initial state.

(4) The same as the step (5) of the mechanism 1.

(5) And (4) if the type of the e is consistent with the type of the event required in the initial event unit U, entering the next step, and otherwise, entering the step (7).

(6) Adding a field processGroupCode to the event message, wherein the value of the field is a node group code c currently pointed by the pointer p, calling a message sending function in the event sending module, and sending the message to a rule engine example in each deployment; and (4) moving the pointing direction of the pointer p by one bit backwards, and entering the step (4).

(7) And (4) calling a message sending function in the event sending module, sending the message of the event to all the rule engine processing nodes, and entering the step (4).

Mechanism 3, under the mutual master-slave strategy, taking the disjunctive binary event unit as the starting unit: on the basis of the mechanism 1, the modification is carried out, wherein the step (6) is that if the type of e is the same as one of the two event types required in the starting event unit U, the next step is entered, otherwise, the step (13) is entered.

Mechanism 4, under the intra-group redundancy strategy, takes disjunctive binary event units as the starting units: on the basis of the mechanism 2, the modification is made in which the step (5) is "if the type of e is the same as one of the two event types required in the starting event unit U, proceed to the next step, otherwise proceed to the step (7)".

Mechanisms 5-16 are specific mechanisms when a conjunctive binary event unit is taken as the starting unit. At this time, three schemes of high availability, high consistency 1PC and high consistency 2PC are divided. Some sub-processes needed to be used in the high-consistency scheme are explained first, and a global queue is needed for a high-consistency mechanism, and the queue adopts a publish/describe mode. Meanwhile, the 1PC solution also needs to add a role of a coordinator, and the coordinator can be deployed independently, or can select one from a plurality of distributed load instances to play a role by election. The coordinator maintains a local queue for merging operations, the elements in the queue are operations, and the operations are classified into the following 2 types:

[ occupation ] operation: the parameter is the number of the message; the return values are "allow" which represents that the message is unoccupied, and "deny" which represents that the message is occupied, at which time all messages preceding the message need to be truncated.

[ CONVERSION ] operation: the parameters are the changed state and the specific information of the message; the return values are 'allow' and 'deny', the 'allow' represents that the queue state is the state that the message belongs to or the queue is empty, and the message needs to be sent and added into the global queue at the moment; "deny" represents that the queue status is not the status to which this message belongs and that the global queue is not empty.

Coordinator was determined using [ election ] procedure: when an instance I attempts to contact the coordinator but does not receive a response, it will attempt to initiate an [ election ] flow, asking to become the new coordinator, and send the serialId of the last start event currently received to the other instance. If the serviceId carried by the starting time received last by other instances is smaller than the value sent by I, I becomes a new coordinator, if other instances have a starting event with a value larger than the value sent by I, I gives up becoming a coordinator, and receives the request sent by other instances to become a new coordinator.

And the mechanism 5 takes a conjunctive binary event as an initial unit when the master and standby strategies are mutually matched, the availability is high, and the matching time duration is not specified by a rule:

(1) Two types of events a and B are needed to obtain the event type list L and the start event unit U.

(2) The same as the step (2) of the mechanism 1.

(3) Four pointers pa1, pa2, pb1 and pb2 are prepared, the pointers point to elements in the circular linked list, the element that pa1 points to is a rule engine node that should perform event sequence matching from the current class a start event, the element that pa2 points to is a rule engine node that actually performs event sequence matching from the last class a start event, the element that pb1 points to is a rule engine node that should perform event sequence matching from the current class B start event, the element that pb2 points to is a rule engine node that actually performs event sequence matching from the last class B start event, and the initial states of the four pointers all point to the first element in the circular linked list.

(4) The same as the step (4) of the mechanism 1.

(5) The same as the step (5) of the mechanism 1.

(6) And if the type of the e is one of the events of the A type and the B type, entering the next step, and otherwise, entering the step (13).

(7) Adding a field processNodeCode to the event message, wherein the value of the field depends on the type of e, if the field is A, the field is a node code ca currently pointed by a pointer pa1, and if the field is B, the field is a node code cb currently pointed by a pointer pb1, calling a message sending function (specifying that response is needed) in an event sending module, sending the message to each rule engine instance in deployment, and starting the response of the instance to be processed.

(8) The same as the step (8) of the mechanism 1.

(9) Since each rule engine instance has different numbers, and the sent processNodeCode value is only the same as one of the numbers, the response receiving function only receives one response, so the decentralized load instance will call the response receiving function of the event sending module to obtain the corresponding serialId response, when the duration of the response is w, if the response is received, the step (10) is entered, otherwise, the pointer is moved backward by one bit (if the type of e is A, the pointer pa1 is moved, otherwise, the pb1 is moved), and the step (7) is repeated.

(10) If the type of e is A, checking whether the pointing direction of the pointer pa2 is consistent with the pointer pa1, otherwise, checking whether the pointing direction of the pointer pb2 is consistent with the pointer pb1, and if so, entering the next step; otherwise, creating a local variable pointer p (if the type of e is A, the pointer p is pa2, otherwise, pb 2), adding 1 to the current failure frequency e of the pointer p, if the failure frequency e reaches f times, setting the failure time t as the current time, temporarily removing the current time from the circular linked list, adding the removed elements into the queue Q, simultaneously checking all 4 pointers, and if a certain pointer points to the element to be removed, moving the pointer one bit backwards; if the current global variable T is not null, the current time is added with the time interval g, the result is assigned to T, the pointer p is moved backwards by one bit, and the step is repeated.

(11) The same as the step (11) of the mechanism 1.

(12) If the type of e is A, the pa1 pointer is moved backward by one bit, otherwise the pb1 pointer is moved backward by one bit, and step (5) is entered.

Mechanism 6, taking a conjunction binary event as an initiating unit when the mutually master and slave policies, 1PC and rules do not specify matching time duration:

(2) The same as the step (2) of the mechanism 1.

(3) And preparing a pointer p to point to an element in the circular linked list, wherein the pointed element is a rule engine node which needs to perform event sequence matching from the current new initial event, and the pointer points to any one element in the circular linked list in the initial state.

(4) The same as the step (5) of the mechanism 1.

(5) If the type of e is one of the events of A or B, then the next step is entered, otherwise, the step (12) is entered.

(6) And (3) assuming that the type of e is A, trying to take out an event o from the global queue, if the type of o can be taken out, the type of o is certainly B, entering the next step, otherwise, the type of o cannot be taken out, and entering the step (10) (if the type of e is B, the taken out o is certainly A).

(7) Sending an [ occupation ] operation request to the coordinator, and if the coordinator returns 'refusal', abandoning the event o and returning to the previous step; if the coordinator returns 'allow', entering the next step; if the coordinator does not respond, initiating an [ election ] process, if the example becomes a new coordinator, entering the next step, and if the example does not become a new coordinator, waiting for time w (taking 100ms according to experience w is more appropriate), and repeating the step.

(8) Adding a field processNodeCode to the message of the event e, wherein the value of the field processNodeCode is the value of the processNodeCode of the event o, calling a message sending function in the event sending module, and sending the message to the rule engine instance in each deployment.

(9) After the rule engine instance receives the message, if the processNodeCode value in the message is found to be consistent with the value specified in the currently deployed service processing template, the rule engine instance will continue to process according to the own rule processing logic and return to the step (4). (in fact, only if the node number is equal to the median value of the message, the node number is regarded as an initial event, and if the node number is not equal, the node number is regarded as a non-initial event, and at the moment, two required initial events are both sent to the same rule engine example)

(10) Adding a field processNodeCode for the message of the event e, wherein the value is p and points to the node number c of the element, sending a 'conversion' operation request to the coordinator, if the coordinator returns 'refusing', abandoning the event o and returning to the step (6), and if the coordinator returns 'allowing', entering the next step.

(11) And (4) moving the position of the p backward by one bit, calling a message sending function in the event sending module, sending the message to each rule engine instance in the deployment, and returning to the step (4).

(12) And (4) calling a message sending function in the event sending module, sending the message of the event to all the rule engine processing nodes, and entering the step (4).

Mechanism 7, mutually master-slave strategy, 2PC, rule do not specify matching time duration, takes conjunctive binary event as starting unit, assuming the number in deployment is 2n + 1:

(2) The same as the step (2) of the mechanism 1.

(3) The same as the step (3) of the mechanism 6.

(4) Initializing a local queue Q with the length of l (generally, l is about 10 according to the business busy degree and the actual processing efficiency) for storing the serialId in the event message successfully occupied by the last l in the example (when the queue is full and a new event message needs to be queued, the head element of the queue is dequeued, and the tail element of the queue is queued); starting a thread, receiving the [ occupation ] requests sent by other distributed load nodes, returning 'permission' if the serialId of the message in the request is not in the queue Q, and otherwise, returning 'rejection'.

(5) Initializing a variable V, wherein the value of the variable V is A, B or null, the variable V represents that the node currently has an unprocessed A-type initial event, the variable V represents that the node currently has an unprocessed B-type initial event, and the variable V represents that the node currently has no unprocessed A-type or B-type event; starting a thread, receiving a [ conversion ] request sent by other distributed load nodes, returning to 'permit' if the value of V is null or the event type carried in the request is the same as the value of the variable V, and otherwise, returning to 'reject'.

(6) The same as the step (5) of the mechanism 1.

(7) If the type of e is one of the events of A or B, then go to the next step, otherwise go to step (14).

(8) And (3) if the type of e is A, trying to take out an event o from the global queue, if the event o can be taken out, the type of o is certainly B, then entering the next step, otherwise, the event o cannot be taken out, and entering the step (12) (if the type of e is B, the taken out o is certainly A).

(9) Sending [ occupation ] operation requests to all other dispersed load examples, and if the number of the examples returning to 'refusal' reaches n, giving up the event o and returning to the previous step; if the number of instances returning "allowed" reaches n, the next step is entered.

(10) The same as the step (8) of the mechanism 6.

(11) After the rule engine instance receives the message, if the processNodeCode value in the message is found to be consistent with the value specified in the currently deployed service processing template, the rule engine instance will continue to process according to the own rule processing logic and return to the step (6).

(12) Adding a field processNodeCode for the message of the event e, wherein the value is p and points to the node number c of the element, sending a 'conversion' operation request to all other scattered load instances, if the number of the instances returning 'refusal' reaches n, abandoning the event o and returning to the step (8), and if the number of the instances returning 'permission' reaches n, entering the next step.

(13) And (4) moving the position of the p backward by one bit, calling a message sending function in the event sending module, sending the message to each rule engine instance in the deployment, and returning to the step (6).

(14) And (6) calling a message sending function in the event sending module, sending the message of the event to all the rule engine processing nodes, and entering the step (6).

And a mechanism 8, taking a conjunction binary event as an initiating unit when the mutually master-slave strategy, high availability and rule specify and match the time length:

(1) Obtaining an event type list L and two types of events A and B required by an initial event unit U; and obtaining the matching duration D configured in the rule.

(2) The same as the step (2) of the mechanism 1.

(3) The same as the step (3) of the mechanism 1.

(4) Initializing a variable T, which represents the time when a rule engine instance node with normal functions can be recovered next time and is empty initially; initializing a queue Q for storing failed rule engine instance node information, wherein the queue Q is empty initially; initializing a queue M for storing the sending time sendtimemap and the processing node code processNodeCode in the event message sent to the rule engine instance by the distributed load instance.

(5) The same as the step (5) of the mechanism 1.

(6) If the type of e is one of the events A or B, then the next step is entered, otherwise, the step (16) is entered.

(7) And (4) judging whether the type of the e is the same as that of the head element of the queue M (the queue M is empty, the type is considered to be the same), if so, entering the next step, otherwise, entering the step (14).

(8) Adding a field processNodeCode to the event message, wherein the value of the field is the node code c currently pointed by the pointer p1, adding a field sendTimestamp to the event message, wherein the value of the field sendTimestamp is the current timestamp, calling a message sending function (for specifying the required response) in the event sending module, sending the message to each rule engine instance in the deployment, and starting the response of the waiting instance.

(9) The same as the step (8) of the mechanism 1.

(10) Because each rule engine instance has different numbers, and the sent processNodeCode value is only the same as one of the numbers, the response receiving function only receives one response, so the decentralized load instance calls the response receiving function of the event sending module to obtain the corresponding serial Id response, when the duration of the response is w, if the response is received, the event message sent this time is put at the tail of the queue M and enters the next step, otherwise, the pointer p1 is moved backwards by one bit and the step (8) is repeated.

(11) Checking whether the pointing directions of the pointers p1 and p2 are consistent, and if so, entering the next step; otherwise, adding 1 to the current pointing failure times e of the pointer p2, if the failure times e reach f times, setting the failure time t as the current time, temporarily removing the failure time from the annular linked list, and adding the removed elements into a queue Q; if the current global variable T is not null, the current time is added with the time interval g, the result is assigned to T, the pointer p2 is moved backwards by one bit, and the step is repeated.

(12) The same as the step (11) of the mechanism 1.

(13) And (4) moving the pointer p1 backward by one bit, and entering the step (7).

(14) And (3) taking out the head element event o from the queue M, if o is empty, entering the step (8), otherwise, dequeuing o and checking whether the difference value between the sendtimemap in o and the current time is greater than the matching time length D, if so, repeating the step, and if not, entering the next step.

(15) And (4) adding a field processNodecode to the message of the event e, wherein the value of the field processNodecode is the value of the processNodecode in the value o, calling a message sending function in the event sending module, sending the message to the rule engine example in each deployment, and returning to the step (5).

(16) And (5) calling a message sending function in the event sending module, sending the message of the event to all the rule engine processing nodes, and entering the step (5).

Mechanism 9, mutually master-slave strategy, 1PC, rule specify matching time duration to take a conjunctive binary event as an initiating unit:

(1) The same as the step (1) of the mechanism 8.

(2) The same as the step (2) of the mechanism 1.

(3) The same as the step (3) of the mechanism 6.

(4) The same as the step (5) of the mechanism 1.

(5) And if the type of the e is one of the events of the A type and the B type, entering the next step, and otherwise, entering the step (13).

(6) And (3) assuming that the type of e is A, trying to take out an event o from the global queue, if the type of o can be taken out, the type of o is certainly B, entering the next step, otherwise, the type of o cannot be taken out, and entering the step (11) (if the type of e is B, the taken out o is certainly A).

(7) And (4) checking whether the difference value of the sendtimemap in the step (o) and the current time is greater than the matching time length D, if so, giving up the step (6), and if not, entering the next step.

(8) Sending an [ occupation ] operation request to the coordinator, and if the coordinator returns 'refusal', abandoning the event o and returning to the previous step; if the coordinator returns 'allow', entering the next step; if the coordinator does not respond, initiating an [ election ] flow, if the example becomes a new coordinator, entering the next step, and if the example does not become a new coordinator, repeating the step after w.

(9) The same as the step (8) of the mechanism 6.

(10) After the rule engine instance receives the message, if the processNodeCode value in the message is found to be consistent with the value specified in the currently deployed service processing template, the rule engine instance will continue to process according to the own rule processing logic and return to the step (4).

(11) Adding a field processNodeCode with the value p pointing to the node number c of the element for the message of the event e, adding a field sendTimestamp with the value of the current timestamp for the message of the event e, sending a [ conversion ] operation request to the coordinator, if the coordinator returns 'reject', abandoning the event o and returning to the step (6), and if the coordinator returns 'permit', entering the next step.

(12) And (5) moving the position of the p backward by one bit, calling a message sending function in the event sending module, sending the message to the rule engine example in each deployment, and returning to the step (4).

(13) And (5) calling a message sending function in the event sending module, sending the message of the event to all the rule engine processing nodes, and entering the step (4).

Mechanism 10, mutually master-slave strategy, 2PC, rule specify matching time duration to take conjunctive binary events as the starting unit, assuming the number in deployment is 2n + 1:

(1) The same as the step (1) of the mechanism 8.

(2) The same as the step (2) of the mechanism 1.

(3) The same as the step (3) of the mechanism 6.

(4) The same as the step (4) of the mechanism 7.

(5) The same as the step (5) of the mechanism 7.

(6) The same as the step (5) of the mechanism 1.

(7) And if the type of the e is one of the events of the A type and the B type, entering the next step, and otherwise, entering the step (15).

(8) And (3) if the type of e is A, trying to take out an event o from the global queue, if the event o can be taken out, the type of o is certainly B, then entering the next step, otherwise, the event o cannot be taken out, and entering the step (13) (if the type of e is B, the taken out o is certainly A).

(9) And (4) checking whether the difference value of the sendtimemap in the step (o) and the current time is greater than the matching time length D, if so, giving up the step (8), and if not, entering the next step.

(10) The same as the step (9) of the mechanism 7.

(11) The same as the step (8) of the mechanism 6.

(12) After the rule engine instance receives the message, if the processNodeCode value in the message is found to be consistent with the value specified in the currently deployed service processing template, the rule engine instance continues to process according to the rule processing logic of the rule engine instance, and returns to the step (6).

(13) Adding a field processNodeCode for the message of the event e, wherein the value is p and points to the node number c of the element, sending a 'conversion' operation request to all other scattered load instances, if the number of the instances returning 'refusal' reaches n, abandoning the event o and returning to the step (8), and if the number of the instances returning 'permission' reaches n, entering the next step.

(14) And (4) moving the position of the p backward by one bit, calling a message sending function in the event sending module, sending the message to each rule engine instance in the deployment, and returning to the step (6).

(15) And (6) calling a message sending function in the event sending module, sending the message of the event to all the rule engine processing nodes, and entering the step (6).

Mechanism 11, redundancy policy in the group, high availability, rule do not specify matching time duration to take a conjunctive binary event as the starting unit:

(1) The same as the step (1) of the mechanism 5.

(2) The number m of the rule engine node groups and the serial numbers of the node groups in the rule engine cluster are obtained, an array S is initialized, the initial length of the array S is m, and the stored content in the array is a rule engine node group code c.

(3) Two pointers p1 and p2 are prepared to point to elements in the array S, wherein the element pointed by p1 is a rule engine node group which should perform event sequence matching from the current class A starting event, the element pointed by p1 is a rule engine node group which should perform event sequence matching from the current class B starting event, and the initial state points to the first element in the array S.

(4) The same as the step (5) of the mechanism 1.

(5) And (4) if the type of the e is one of the events of the A type and the B type, entering the next step, and otherwise, entering the step (7).

(6) Creating a local variable p, if e is of type A, the value of p is a pointer p1, if e is of type B, the value of p is a pointer p2, adding a field processGroupCode to the event message, using a node code c currently pointed by p for the value of the field, calling a message sending function in an event sending module, sending the message to a rule engine example in each deployment, moving the point of p backwards by one bit (namely moving the point of p1 or p2 backwards by one bit) after the message is sent, and returning to the step (4).

Mechanism 12, the intra-group redundancy policy, 1PC, and rule do not specify matching time duration to take a conjunctive binary event as an initiating unit: on the basis of the mechanism 6, all the fields of the processNodeCode are changed into fields of processGroupCode, and the node codes are changed into node group codes.

Mechanism 13, the intra-group redundancy policy, 1PC, and rule do not specify matching time duration to take a conjunctive binary event as the starting unit: on the basis of the mechanism 7, all the fields of the processNodeCode are replaced by fields of processGroupCode, and the node codes are replaced by node group codes.

Mechanism 14, intra-group redundancy policy, high availability, rule specify matching the time duration to take a conjunctive binary event as the starting unit:

(1) The same as the step (1) of the mechanism 8.

(2) The same as the step (2) of the mechanism 11.

(3) The same as the step (3) of the mechanism 11.

(4) Initializing a queue M for storing the sending time sendtimemap and the processing node group code processGroupCode in the event information sent to the rule engine instance by the present decentralized load instance.

(5) The same as the step (5) of the mechanism 1.

(6) And if the type of the e is one of the events of the A type and the B type, entering the next step, and otherwise, entering the step (11).

(7) And (4) judging whether the type of the e is the same as that of the head element of the queue M (if the queue M is empty, the type is considered to be the same), if so, entering the next step, otherwise, entering the step (9).

(8) Firstly, creating a local variable p, if e is of type A, the value of p is a pointer p1, if e is of type B, the value of p is a pointer p2, adding a field processGroupCode to the event message, and using a node group code c currently pointed by p for the value of the field; then adding a field sendtimetag to the event message, wherein the value is the current timestamp; and finally, calling a message sending function in the event sending module, sending the message to the rule engine instance in each deployment, moving the point of p backwards by one bit (namely moving the point of p1 or p2 backwards by one bit) after the message is sent, adding the event message to the tail of the queue M, and returning to the step (5).

(9) And (3) taking out the head element event o from the queue M, if o is empty, entering the step (8), otherwise, dequeuing o and checking whether the difference value between the sendtimemap in o and the current time is greater than the matching time length D, if so, repeating the step, and if not, entering the next step.

(10) And (3) adding a field processGroupCode to the message of the event e, wherein the value is the processGroupCode value in o, calling a message sending function in the event sending module, sending the message to each rule engine instance in the deployment, and returning to the step (5).

(11) And (5) calling a message sending function in the event sending module, sending the message of the event to all the rule engine processing nodes, and returning to the step (5).

Mechanism 15, in-group redundancy policy, 1PC, rule specify matching the time duration to take a conjunction binary event as the starting unit: on the basis of the mechanism 9, all the fields of the processNodeCode are replaced by fields of processGroupCode, and the node code is replaced by node group code.

Mechanism 16, the intra-group redundancy policy, 2PC, and rule specify that the matching time duration takes a conjunctive binary event as an initiating unit: on the basis of the mechanism 10, all the fields of the processNodeCode are replaced by fields of processGroupCode, and the node codes are replaced by node group codes.

The mechanism 17 is used when various variants of non-event units are used as starting units, when at least one type of event has to occur less times, it can be attempted to split it into two parts, the first part is called the starting part from the beginning to the first non-event unit in the event sequence required by the rule, and the second part is the remaining part, where the remaining part can be empty. If the rest part is not empty, adding a unary event unit R to the rest part to be placed at the head of the rest part, wherein the event received by R is a trigger event of the initial part (R is a logic event, and the ID of the event is-1 and is only used in the rule, different from the event stored in the database and configured by the event type configuration function), and completing horizontal extension by using a mechanism 1 and a mechanism 2. For the start part, requiring deployment on only one node in the rule engine cluster, the following mechanism is used:

(1) A key-value pair K is created in Redis, the key is startPart + serailId + rule id, and the value is 0. Starting a thread responsible for consuming data in topic named startPart + rule id in kafka

(2) The event sequence configuration of the rule is analyzed by using a rule analysis function, an event unit list W contained in the initial part is obtained, and an event type list L received by each unit in the W is obtained.

(3) The same as the step (5) of the mechanism 1.

(4) And (3) calling a message sending function (appointed target node) in the event sending module, sending the message to the rule engine node deploying the initial part, and returning to the step (3) after the sending is finished.

(4) When the rule engine node determines that all event units in W have satisfied the transition condition in turn, the rule in this initial part will trigger, and a trigger message will be sent to topic named startPart + rule id in kafka.

(5) When a logic event R is received, the scattered load instance reads the value v of the key-value pair K, adds the field serialId to the logic event R, and updates the key-value pair K with the new value after adding v by 1. And calling a message sending function in the event sending module, and sending the message to all rule engine instances. And (4) returning to the step (3).

An event sending module: and the system is responsible for communication between the event sequence bus and each node in the rule engine cluster, and comprises a step of sending a message to each rule engine node and a step of receiving a response sent back from each rule engine node. The system mainly comprises two functions, namely a message sending function and a response receiving function, which are communicated through a hash bucket M. Fig. 8 shows the structure and flow of the event sending module, i.e. a detailed development of the event sending module in fig. 3.

The message sending function: the details of this function, which is responsible for sending the message of the event from the event sequence bus to the rule engine instance in the rule engine cluster, are as follows:

[ CALLER ]: each dispersed load instance among the dispersed load modules;

[ cf. 1 ] is: when the event message is obtained, adding a field busCode to the message, wherein the value is a bus number;

[ CHEN 2 ] is as follows: whether response is needed or not can be null, and the null represents that no response is needed;

[ cf. 3 ] of: the appointed rule engine node number can be null, and represents that the event message is only sent to the node when the rule engine node number is not null;

[ Effect ] is: if no response is needed and the node number is null, broadcasting the message through topic named eventSequence + rule id in kafka; if no response is needed and the node number is not null, the response is sent to one node separately through topic named eventSequence + node (group) code + rule id in kafka; if a response is required, on the basis of the execution, the seriald in the event message is also taken out, a key value pair is added into the hash bucket M, key is the value of the seriald field, and value is a duplet which consists of the current timestamp and the state (initially false).

In response to the receiving function: starting a main thread E, scanning all key-value pairs in the hash bucket every 2 seconds, and if the timestamp in the binary of the value exceeds 2 seconds compared with the current time, deleting the key-value pair from M. And starting a main thread R, receiving a request for inquiring whether a response exists in each scattered load instance in the middleware of the event sequence, if the key is currently arranged in M and the state in the value duplet is true, returning the received response, and otherwise, returning the response which is not received. And starting a plurality of threads, receiving all messages with the topoc name of nodeResponse + rule id + bus number format in kafka, acquiring a serial Id field in the received messages, and updating the state of true in the value binary group if the key exists in M.

Rule output bus: when the intra-group redundancy strategy is used, the method is only responsible for the merging work of rule output, and when the mutually active and standby strategies are used, the method also needs to be responsible for the error correction function of merging and transferring loads. The rule output buses can be deployed in multiple numbers, one rule output bus is responsible for output deduplication tasks of a plurality of rules, and rules of different rule output buses which are responsible for deduplication do not overlap. The rule output bus includes a "repeated trigger merging module" and a "processing result sending module", and fig. 9 is a structural diagram and a flow of the rule output bus, that is, an expansion of the rule output bus in fig. 1.

And a repeated triggering and combining module: according to the different selected cluster expansion strategies, the following two processes are respectively adopted, wherein the process 1 is under the mutual active-standby strategy:

(1) A variable M is created, initialized to 0.

(2) Starting a message in topic named rulOutput + rule id in the interval consumption kafka of threads, receiving a batch of messages every 1 second, if no message duplication step is received, otherwise, entering the next step.

(3) A list L is initialized holding the maxserialId that appears in the batch.

(4) Traversing the batch of messages, fetching the maxserialId field value v in each message, discarding the message if v is less than M or already exists in the list L, otherwise, proceeding.

(5) And (3) replacing the value of M with the value of v, putting v into the list L, calling a processing result sending module, sending the message to a downstream business system, and continuing to traverse the batch of messages until the batch of messages completely traverse, and returning to the step (2).

The process 2 is under the intra-group redundancy strategy, and the number of instances in the node group is set to be 2n-1:

(1) Creating a hash bucket M, initializing to be empty, wherein the key of the hash bucket is the maximum starting event number, and the value is a duplet which comprises a timestamp and the receiving times.

(2) Starting a thread, scanning the value in the hash bucket once at intervals (the value is 2-10 times of RTT, and taking 2 seconds as appropriate according to experience), and deleting the key value pair if the time stamp in the value is found to exceed 2 seconds from the current time.

(3) And starting a thread to consume the message in topic named rulOutput + rule id in kafka, repeating the step if no message is received, and entering the next step if no message is received.

(4) Taking out the field value k of the maxserialId in each message, if the hash bucket M does not contain the key of k, adding the key of k into the hash bucket M, wherein the binary group of the value is the current timestamp, and the receiving times are 1; if the hash bucket M already contains the key of k, adding 1 to the number of times of receipt in the corresponding value binary group; and then proceed to the next step.

(5) Checking the receiving times in the current value binary group, if the receiving times are more than n, indicating that more than half of the group nodes consider that the rule should be triggered, deleting the key value pair at the moment, calling a processing result sending module, sending the message to a downstream service system, and returning to the step (3); if not, directly returning to the step (3).

A processing result sending module: the caller is a repeated triggering and merging module of a rule output bus and carries the result to be output to the downstream service system and the id of the rule during calling. When this module receives a call, the id of this rule is added to the ruleId field of the result message. The module supports mainstream message middleware and sends the processed result in a mode of transmitting the processed result to a message queue in the mainstream message middleware.

The technical solutions described above only represent the preferred technical solutions of the present invention, and some possible modifications to some parts of the technical solutions by those skilled in the art all represent the principles of the present invention, and fall within the protection scope of the present invention.

Claims

1. A control system for horizontally expanding a rule engine instance is characterized by comprising an upstream event source, a downstream business system, an event sequence bus group, a time synchronization module, a rule engine cluster and a rule output bus group;

the upstream event source refers to any business system which obtains the business processing result and the limiting condition corresponding to the processing result together, and provides business processing events for all businesses of the rule engine system;

2. The control system for horizontally expanding rule engine instances as claimed in claim 1, wherein the rule engine cluster unit is composed of a plurality of rule engine instances, each instance can be deployed in a single process mode or a mirror mode, and each instance has no state.

3. The control system for horizontally expanding rule engine instances according to claim 1, wherein the event sequence bus group is composed of a plurality of event sequence buses, and is jointly responsible for providing event sequences for each node or node group in the instance cluster in the rule engine cluster, and each event sequence bus is composed of an event receiving module, an event message coding module, a rule configuration and analysis module, a distributed load module (the distributed load module comprises a plurality of distributed load instances), and an event sending module.

4. The control system of claim 1, wherein the rule engine cluster comprises an instance encoding module and an instance cluster, and wherein the instance cluster can select two strategies: when the mutual active-standby strategy is selected, the example cluster comprises a plurality of rule engine nodes; when an intra-group redundancy policy is selected, multiple rule engine node groups are contained within the instance cluster.

5. The control system for horizontally expanding the rule engine instance according to claim 1, wherein the rule output bus group is composed of a plurality of rule output buses, and is jointly responsible for de-overlapping and sending the processing results obtained by the rule engine cluster; each rule output bus comprises a repeated triggering and merging module and a processing result sending module; the repeated triggering and combining module comprises a plurality of duplicate removal tasks, and each task is responsible for removing duplicate of one rule; and the processing result sending module is responsible for sending the processing result after de-duplication and combination to a downstream service system.

6. The control system of claim 4, wherein the inter-active-standby cluster deployment policy is: all rule engine instances form a mutual primary-backup relationship, that is, if one instance fails, another instance in the cluster takes over the failed instance to perform service processing once.

7. The control system for enabling a rules engine instance to scale horizontally as claimed in claim 1 wherein the cluster deployment policy for intra-group redundancy: all the instances are logically divided into a plurality of groups, the number of the instances in each group is odd, all the instances in each group execute the same service, if a certain node fails, other nodes in the group cannot be influenced, and finally, the processing result of one service is based on the principle that most nodes of the nodes in the group reach the same result.

8. The control system according to claim 7, wherein each instance is a node in the cluster when the rules engine cluster selects the mutually active-standby policy, and wherein a plurality of instances together form a node group in the cluster when the intra-group redundancy policy is selected.

9. The control system according to claim 7, wherein said instance coding module is responsible for coding the instances in said rule engine instance cluster in the above section, and the coding represents an instance, the current working state of the instance, the service processing condition, and the processing result in the system.