CN106649000B - Fault recovery method of real-time processing engine and corresponding server - Google Patents

Fault recovery method of real-time processing engine and corresponding server Download PDF

Info

Publication number
CN106649000B
CN106649000B CN201710002127.0A CN201710002127A CN106649000B CN 106649000 B CN106649000 B CN 106649000B CN 201710002127 A CN201710002127 A CN 201710002127A CN 106649000 B CN106649000 B CN 106649000B
Authority
CN
China
Prior art keywords
real
time processing
application
server
processing application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710002127.0A
Other languages
Chinese (zh)
Other versions
CN106649000A (en
Inventor
季钱飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Transwarp Technology Shanghai Co Ltd
Original Assignee
Transwarp Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Transwarp Technology Shanghai Co Ltd filed Critical Transwarp Technology Shanghai Co Ltd
Priority to CN201710002127.0A priority Critical patent/CN106649000B/en
Publication of CN106649000A publication Critical patent/CN106649000A/en
Application granted granted Critical
Publication of CN106649000B publication Critical patent/CN106649000B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2082Data synchronisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Hardware Redundancy (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The method aims to provide a fault recovery scheme of a real-time processing engine, wherein if any one server acquires a synchronous lock when starting, the server becomes a main server and provides service for the outside; in the process that the main server provides the servers to the outside, if a fault occurs, the main server releases the synchronous lock to trigger the standby server to apply for the synchronous lock, so that the standby server can acquire the synchronous lock to become a new main server.

Description

Fault recovery method of real-time processing engine and corresponding server
Technical Field
The application relates to the technical field of information, in particular to an automatic fault recovery technology of a real-time processing engine.
Background
With the development of big data technology, enterprises can store and process larger-scale data volume than ever, and reach TB level and even PB level. At present, the implementation of enterprises on mass data is mainly offline analysis business, and the business from the generation of data to the generation of results usually needs a period of T +1 or even longer. For many industries with high real-time requirements, this is not sufficient for their business requirements. How to process data faster and feed back results in real time is a problem which needs to be solved urgently in the field of big data.
The real-time processing engine provides possibility for meeting the requirement of processing big data in real time by an enterprise, and can help the enterprise to carry out work such as ETL, real-time report analysis, and even real-time machine learning. Currently, several distributed real-time processing engines are currently on the market, including: apache flight, Spark Streaming, etc., users can realize services with real-time requirements through API interfaces provided by the engines.
The real-time processing is very different from the traditional batch processing service, wherein the most important point is that the real-time processing service processes data without boundaries, namely the data is continuous, and clients generally require the real-time processing service to run continuously by 7 × 24. However, the distributed system will stop service due to various reasons such as network and hardware failures, and in such a case, it is necessary to find the failure in time, so as to ensure that data is not lost and the failure recovery is performed in the shortest time. At present, mainstream real-time processing engines such as apache flink and Spark Streaming provide mechanisms to ensure the reliability of data, but do not provide a set of complete automatic fault recovery services.
Content of application
It is an object of the present application to provide an automatic failure recovery technique for a real-time processing engine.
In order to achieve the above object, the present application provides a method for recovering a failure of a real-time processing engine, the method comprising:
when the synchronous lock is acquired, the synchronous lock becomes a main server;
when the real-time processing application is executed, recording application information about the currently executed real-time processing application, so that when the standby server becomes a main server, the corresponding real-time processing application is continuously executed by acquiring the application information;
and when a fault occurs, releasing the synchronous lock to trigger the standby server to apply for the synchronous lock.
Further, the method further comprises:
when the synchronous lock is not acquired, the standby server is formed;
when the main server releases the synchronous lock, applying for the synchronous lock;
when the synchronous lock released by the main server is acquired through application, the main server becomes a new main server;
and acquiring the application information, and continuously executing the corresponding real-time processing application according to the application information.
Further, in the real-time processing application, the processing operation of the real-time processing application is defined by an SQL statement.
Further, regarding the processing operation of the real-time processing application, the processing operation includes:
an operation of creating the real-time processing application;
and carrying out real-time processing operation on the data.
Further, the method further comprises:
and when a creation request of the real-time processing application is acquired, storing the real-time processing application into a database in a persistent mode through metadata storage.
Further, executing the real-time processing application comprises:
acquiring the SQL sentences of the real-time processing application;
acquiring an operational character of the operation for processing the data in real time according to the SQL statement;
submitting the operator to a computing cluster, and executing the operator by the computing cluster to realize real-time processing operation on data.
Further, acquiring the SQL statement of the real-time processing application includes:
obtaining the real-time processing application from a database through metadata storage;
and acquiring SQL sentences of the operations for processing the data in real time in the real-time processing application.
Further, recording application information about the currently executing real-time processing application includes:
creating a recording node in the coordination service system, and writing application information about the currently executed real-time processing application into the recording node.
Further, the method further comprises:
and when the real-time processing application stops executing, deleting the application information and the recording node corresponding to the real-time processing application in the coordination service system.
Further, acquiring the application information includes:
application information about a currently executing real-time processing application is read from a logging node of the orchestration service system.
According to another aspect of the present application, there is also provided a fault recovery server for a real-time processing engine, the server including:
the switching device is used for enabling the server to become a main server when the synchronous lock is acquired, and releasing the synchronous lock when a fault occurs so as to trigger the standby server to apply for the synchronous lock;
and the real-time processing device is used for recording application information about the currently executed real-time processing application when the real-time processing application is executed, so that the standby server continues to execute the corresponding real-time processing application by acquiring the application information when the standby server becomes the main server.
Further, the switching device is further configured to enable the server to become a standby server when the synchronization lock is not acquired; when the main server releases the synchronous lock, applying for the synchronous lock; when the synchronous lock released by the main server is acquired through application, the server becomes a new main server;
and the real-time processing device is also used for acquiring the application information and continuously executing the corresponding real-time processing application according to the application information.
Further, in the real-time processing application, the processing operation of the real-time processing application is defined by an SQL statement.
Further, regarding the processing operation of the real-time processing application, the processing operation includes:
an operation of creating the real-time processing application;
and carrying out real-time processing operation on the data.
Further, the real-time processing device is further configured to, when a creation request of a real-time processing application is obtained, persist the real-time processing application to a database through metadata storage.
Further, the real-time processing device is configured to obtain an SQL statement of the real-time processing application; acquiring an operational character of the operation for processing the data in real time according to the SQL statement; and submitting the operator to a computing cluster, and executing the operator by the computing cluster to realize real-time processing operation on the data.
Further, the real-time processing device is used for acquiring the real-time processing application from a database through metadata storage; and acquiring SQL sentences of the operations for processing the data in real time in the real-time processing application.
Further, the real-time processing device is configured to create a recording node in the coordination service system, and write application information about the currently executed real-time processing application into the recording node.
Further, the real-time processing apparatus is further configured to delete the application information and the record node corresponding to the real-time processing application in the coordination service system when the execution of the real-time processing application is stopped.
Further, the real-time processing device is used for reading the application information of the currently executed real-time processing application from the recording node of the coordination service system.
Compared with the prior art, in the scheme provided by the application, if any one server acquires the synchronous lock when being started, the server becomes the main server and provides service to the outside; in the process that the main server provides the servers to the outside, if a fault occurs, the main server releases the synchronous lock to trigger the standby server to apply for the synchronous lock, so that the standby server can acquire the synchronous lock to become a new main server.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
fig. 1 is a schematic diagram illustrating a master/standby mode activation based on Zookeeper in an embodiment of the present application;
FIG. 2 is a dataflow diagram of a real-time processing task in an embodiment of the present application;
FIG. 3 is a schematic diagram of real-time processing application state tracking using Zookeeper in an embodiment of the present application;
fig. 4 is a schematic diagram illustrating fault recovery based on Zookeeper in an embodiment of the present application;
FIG. 5 is a flowchart illustrating a process of creating a real-time processing application by a server according to an embodiment of the present application;
FIG. 6 is a flowchart illustrating a process of executing a real-time processing application by a host server according to an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of a failover server of a real-time processing engine provided in an embodiment of the present application;
the same or similar reference numbers in the drawings identify the same or similar elements.
Detailed Description
The present application is described in further detail below with reference to the attached figures.
In a typical configuration of the present application, the terminal, the devices serving the network each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, which include both non-transitory and non-transitory, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
The embodiment of the application provides a fault recovery method for a real-time processing engine, the real-time processing engine applied to the method comprises a plurality of servers, and when the real-time processing engine is started, the plurality of servers can be started at the same time to apply for a synchronous Lock (Lock).
When any Server (Server) acquires the synchronization lock, it becomes a master Server (ActiveServer). The main server can obtain the resource use right of the computing cluster, and provides services for the outside, namely the data processing function of the real-time processing engine, which is realized by calling the resources of the computing cluster by the main server; correspondingly, if the Server does not acquire the synchronization lock, the Server becomes a Standby Server (Standby Server), and the Standby Server does not acquire the resource use right of the computing cluster and provides no service to the outside.
In an actual scenario, the server may apply for a synchronization lock from the Zookeeper, and the computing Cluster may be Spark Cluster. When the server obtains the resource usage right of the computing cluster, the corresponding real-time processing task can be completed through the Executor, provided by sparkcuster, as shown in fig. 1 specifically. When the real-time processing engine is just started, the two servers are started at the same time, a Zookeeper is applied for a synchronous lock, the Server which obtains the synchronous lock becomes an Active Server, the Server starts providing service and obtains the resource use right of the Spark Cluster, and the other Server which does not obtain the synchronous lock becomes a Standby Server. It should be understood by those skilled in the art that the number of various elements shown in the figures may be smaller than the number of corresponding elements in the actual scenario (for example, the number of servers may be more than two, and all servers that do not acquire the synchronization lock will become Standby servers), but such omission is certainly premised on a clear and sufficient disclosure of the present invention.
When the main server provides external services, the real-time processing task of data is completed by using the resources of the computing cluster to execute the real-time processing application (application). For a basic data processing task, at least comprising: defining a real-time data source, analyzing and processing data from the data source, and outputting a processing result to a designated storage. Fig. 2 is a data flow diagram of a real-time processing task according to an embodiment of the present application, where a distributed publish-subscribe message system kafka is used as a data source, real-time data is read from a data source Partition (Partition) of the kafka, and analysis processing on the real-time data mainly includes writing ERROR log information (including ERROR log information) in a result table of a database. Specifically, the master server reads data from kafka by starting a Receiver, then filters log information which does not contain ERROR by a Filter, and writes the filtered data into a result table of a Database by a Sink operation.
In an actual scenario, when the primary server fails, the synchronization lock is released to trigger the standby server to apply for the synchronization lock. At this time, for the standby server, when the main server releases the synchronous lock, the synchronous lock is applied, and when the synchronous lock released by the main server is obtained through application, the standby server becomes a new main server, so that automatic switching between the main server and the standby server is realized.
Therefore, in order to ensure that the real-time processing task can be continuously and smoothly performed during the active-standby switching, in the scheme provided by the embodiment of the application, when the real-time processing application is executed, the main server records the application information of the currently executed real-time processing application, so that when the standby server becomes the main server, the corresponding real-time processing application is continuously executed by acquiring the application information. Namely, when the standby server becomes a new main server, the application information is obtained, and the corresponding real-time processing application is continuously executed according to the application information.
In an embodiment of the present application, the main server may utilize a coordination service system (e.g., Zookeeper) to record application information about a currently executed real-time processing application, and a specific processing procedure includes: a logging node is created in the orchestration service system, and then application information about a currently executing real-time processing application is written into the logging node. Fig. 3 is a schematic diagram illustrating real-time processing application state tracking by using a Zookeeper, wherein after the real-time processing application is started, a main server tracks the state of the real-time processing application, records application information of a currently Running real-time processing application in a memory, and then creates a recording node [/Running/app ] in the Zookeeper to record the application information.
Further, when the real-time processing application stops executing, the main server may delete the application information and the recording node corresponding to the real-time processing application in the coordination service system. For the example shown in fig. 3, when the real-time processing application stops executing, the main server will delete the application information about the real-time processing application in the memory, and will delete the corresponding recording node [/Running/app ] on the Zookeeper.
Accordingly, when the standby server becomes a new main server, the standby server also reads application information about the currently executed real-time processing application from the recording node of the coordination service system. By recording the application information of the currently executed real-time processing application by the main server and reading the application information when the standby server becomes a new main server, the related real-time processing application is continuously executed, and a complete automatic failure recovery mechanism can be realized. Fig. 4 shows a principle of implementing failure recovery by the scheme of the embodiment of the present application, when the Active Server fails, the standby Server StandbyServer may obtain a synchronization lock from the Zookeeper in the coordination service system, thereby becoming a new Active Server, and at this time, it will obtain a resource usage right of the compute Cluster Spark Cluster, and read application information in the record node [/Running/app ] from the Zookeeper, that is, related information of the real-time processing application that is still Running when the original main Server fails. Then, the new Active Server resubmits the real-time processing application to the Spark Cluster for execution, so as to complete the previously unfinished real-time processing task and realize the automatic recovery of the fault.
Further, in the real-time processing application for implementing data implementation processing according to the embodiment of the present application, the processing operation of the real-time processing application is defined by an SQL statement. Specifically, the processing operations related to the real-time processing application may include an operation of creating the real-time processing application, an operation of performing real-time processing on data, and the like. Compared with the prior art, for example, real-time processing engines such as Apache Flink and Spark Streaming need to write application program codes such as Java and Scala to define the operations, when defining processing operations related to real-time processing applications, a user needs to build a programming environment from the beginning, obtain a dependent SDK, package, deploy to a cluster for testing and use, need to be familiar with various APIs and logic of a distributed system, and are very complex and inefficient. The SQL statement is used for defining the processing operation related to the real-time processing application, a programming environment does not need to be built, the SDK is not relied on, the configuration and modification processes are simplified, and the method has the advantage of convenience in management.
In one embodiment of the present application, the related processing operations of the real-time processing application may be defined by SQL statements as follows. For example, the operation of creating the real-time processing application may be [ create application apps [ properties ("2") ] ], and the operation of processing data in real time may include: defining operations of a real-time data source on Kafka [ create stream sources ("source" ═ Kafka ", Kafka. zookeeper" ═ breaker: 2181 ", and topic" ═ source ") ]; defining a filtering operation of the real-time data: [ create stream ERROR logs estimate from source where message like "% ERROR" ]; and writing the filtered data into a result table: [ insert in result select from errors ] and the like.
Because the related processing operation of the real-time processing application needs to be defined by the SQL statement, in the method provided by the embodiment of the present application, the specific creation mode of the real-time processing application is as follows: when the server obtains a creation request of the real-time processing application, the real-time processing application is stored in a database in a persistent mode through metadata storage.
The creating request can come from the client device, and the user creates a real-time processing application with a specific function through the client device so as to complete a corresponding real-time processing task. Therefore, when receiving the creation request, the Server stores the relevant information of the real-time processing application in a database through a metadata store (MetaStore), specifically, the database may be a database using SQL such as MySQL. FIG. 5 illustrates a process flow for creating a real-time processing application in one embodiment of the present application. When a creation request is received, the creation request (create request) is sent to the MetaStore, and then the MetaStore sends a corresponding write request (write request) to MySQL, so that persistent storage is realized.
Specifically, the relevant information of the real-time processing application may include the following fields: the specific table structure in MySQL of the real-time processing application, such as identification Information (ID), Name (Name), creation time (CreateTime), latest modification time (LastModifyTime), and corresponding SQL statement (Command) for executing tasks, is shown in table 1:
field(s) Type (B) Whether it is a main key
ID Bigint(20) Is that
Name Varchar(128)
CreateTime Int(11)
LastModifyTime Int(11)
Command mediumtext
TABLE 1
In addition, when creating the real-time processing application, the user specifies some configurations of the real-time processing application when executing, and table 2 shows a table structure of the real-time processing application configured in MySQL, which may include the following fields: identification Information (ID) of the real-time processing application, parameter KEY (PARAM _ KEY), parameter VALUE (PARAM _ VALUE), and the like.
Field(s) Type (B) Whether it is a main key
APP_ID Bigint(20) Is that
PARAM_KEY Varchar(128) Is that
PARAM_VALUE Varchar(4000)
TABLE 2
In the actual processing process, the user may execute a real-time processing application, such as a [ start application app ], using the SQL statement by the Client device. After receiving the command for executing the real-time processing application, the main server executes the real-time processing application, and specifically includes the following processing steps:
first, the main server obtains the SQL statement of the real-time processing application. In the scenario described in the foregoing example, the relevant information of the real-time processing application is persistently saved in MySQL through MetaStore. At this time, in the process of acquiring the SQL statement, the main server acquires the real-time processing application from the database through the metadata storage, that is, after receiving a command for executing the real-time processing application, the main server sends a request to the MetaStore, so that the MetaStore acquires relevant information about the real-time processing application from the MySQL data table, and then returns the information to the main server. Because the related information contains the SQL statement, the main server acquires the SQL statement of the operation for processing the data in real time in the real-time processing application. For example, the main server may finally obtain the SQL statement of the real-time processing application regarding the operation of writing the filtered data into the result table: [ insert in result select from errors ].
And then, the main server acquires an operator of the operation for processing the data in real time according to the SQL statement. In an embodiment of the present application, the main server may parse SQL statements of a real-time processing application through an SQL Compiler (Compiler) to generate an Execution Plan (Execution Plan), which includes several operators: ROp is an operator to read data from Kafka, FOp is an operator to filter data in between, SOp is an operator to output final results.
And finally, submitting the operator to a computing cluster by the main server, and executing the operator by the computing cluster so as to realize real-time processing operation on the data. For example, taking the scenario in the embodiment of the present application as an example, the main server submits the execution plan including the operator to Spark Cluster, and the execution plan is executed by an Executor executive in Spark Cluster, and a specific execution flow is shown in fig. 6.
Based on the same inventive concept, the embodiment of the present application further provides a fault recovery server of the real-time processing engine, and the corresponding method of the fault recovery server is the fault recovery method of the real-time processing engine in the foregoing embodiment, and the principle of solving the problem is similar to the method.
Fig. 7 illustrates a fault recovery server of a real-time processing engine according to an embodiment of the present application, which includes a switching device 710 and a real-time processing device 720. The real-time processing engine comprises a plurality of servers, and when the real-time processing engine is started, the plurality of servers can be started at the same time to apply for the synchronous lock.
The switching device 710 of any Server is used to make the Server itself a master Server (Active Server) when the synchronization lock is acquired. The main server can obtain the resource use right of the computing cluster, and provides services for the outside, namely the data processing function of the real-time processing engine, which is realized by calling the resources of the computing cluster by the main server; correspondingly, the switching device 710 is configured to, when the synchronization lock is not acquired, enable the server to become a standby server (standby server), where the standby server does not acquire the resource usage right of the computing cluster and does not provide a service to the outside.
In an actual scenario, the server may apply for a synchronization lock from the Zookeeper, and the computing Cluster may be Spark Cluster. When the server obtains the resource usage right of the computing cluster, the corresponding real-time processing task can be completed through the Executor, provided by sparkcuster, as shown in fig. 1 specifically. When the real-time processing engine is just started, the two servers are started at the same time, a Zookeeper is applied for a synchronous lock, the Server which obtains the synchronous lock becomes an Active Server, the Server starts providing service and obtains the resource use right of the Spark Cluster, and the other Server which does not obtain the synchronous lock becomes a Standby Server. It should be understood by those skilled in the art that the number of various elements shown in the figures may be smaller than the number of corresponding elements in the actual scenario (for example, the number of servers may be more than two, and all servers that do not acquire the synchronization lock will become Standby servers), but such omission is certainly premised on a clear and sufficient disclosure of the present invention.
And when the main server provides services to the outside, the real-time processing task of the data is completed by executing the real-time processing application by using the resources of the computing cluster. For a basic data processing task, at least comprising: defining a real-time data source, analyzing and processing data from the data source, and outputting a processing result to a designated storage. Fig. 2 is a data flow diagram of a real-time processing task according to an embodiment of the present application, where a distributed publish-subscribe message system kafka is used as a data source, real-time data is read from a message queue of the kafka, and analysis processing on the real-time data is mainly to write log information of ERRORs (including log information of ERROR) in a result table of a database. Specifically, the master server reads data from kafka by starting a Receiver, then filters log information which does not contain ERROR by a Filter, and writes the filtered data into a result table of a Database by a Sink operation.
In practical scenarios, when the primary server fails, the switching device 710 thereof will release the synchronization lock to trigger the standby server to apply for the synchronization lock. At this time, for the standby server, the switching device 710 applies for the synchronization lock when the main server releases the synchronization lock, and makes the server become a new main server when the synchronization lock released by the main server is obtained by applying for, thereby implementing automatic switching between the main server and the standby server.
Therefore, in order to ensure that the real-time processing task can be continuously and smoothly performed during the active-standby switching, in the solution provided in this embodiment of the application, when the real-time processing application is executed, the real-time processing device 720 of the primary server records application information about the currently executed real-time processing application, so that when the secondary server becomes the primary server, the corresponding real-time processing application is continuously executed by acquiring the application information. That is, when the standby server becomes the new primary server, the real-time processing device 720 will obtain the application information and continue to execute the corresponding real-time processing application according to the application information.
In an embodiment of the present application, the main server may utilize a coordination service system (e.g., Zookeeper) to record application information about a currently executed real-time processing application, and a specific processing procedure of the real-time processing apparatus 720 includes: a logging node is created in the orchestration service system, and then application information about a currently executing real-time processing application is written into the logging node. Fig. 3 is a schematic diagram illustrating real-time processing application state tracking by using Zookeeper, wherein after the real-time processing application is started, the main server tracks the state of the real-time processing application, records application information of the currently Running real-time processing application in a memory, and then creates (create) a recording node [/Running/app ] in the Zookeeper for recording the application information.
Further, when the real-time processing application stops executing, the real-time processing device 720 of the main server deletes the application information and the recording node corresponding to the real-time processing application in the coordination service system. For the example shown in fig. 3, when the real-time processing application stops executing, the main server will delete the application information about the real-time processing application in the memory, and will delete the corresponding recording node [/Running/app ] on the Zookeeper (Remove).
Accordingly, when the standby server becomes the new main server, the real-time processing device 720 reads the application information about the currently executed real-time processing application from the recording node of the coordination service system. By recording the application information of the currently executed real-time processing application by the main server and reading the application information when the standby server becomes a new main server, the related real-time processing application is continuously executed, and a complete automatic failure recovery mechanism can be realized. Fig. 4 shows a principle of implementing failure recovery by the scheme of the embodiment of the present application, when the Active Server fails, the Standby Server Standby may obtain a synchronization lock from the Zookeeper in the coordination service system, thereby becoming a new Active Server, and at this time, it will obtain the resource usage right of the compute Cluster Spark Cluster, and read the application information in the record node [/Running/app ] from the Zookeeper, that is, the related information of the real-time processing application that is still Running when the original main Server fails. Then, the new Active Server resubmits the real-time processing application to the sparkCluster for execution, so as to complete the previously unfinished real-time processing task and realize the automatic recovery of the fault.
Further, in the real-time processing application for implementing data implementation processing according to the embodiment of the present application, the processing operation of the real-time processing application is defined by an SQL statement. Specifically, the processing operations related to the real-time processing application may include an operation of creating the real-time processing application, an operation of performing real-time processing on data, and the like. Compared with the prior art, for example, real-time processing engines such as Apache Flink and Spark Streaming need to write application program codes such as Java and Scala to define the operations, when defining processing operations related to real-time processing applications, a user needs to build a programming environment from the beginning, obtain a dependent SDK, package, deploy to a cluster for testing and use, need to be familiar with various APIs and logic of a distributed system, and are very complex and inefficient. The SQL statement is used for defining the processing operation related to the real-time processing application, a programming environment does not need to be built, the SDK is not relied on, the configuration and modification processes are simplified, and the method has the advantage of convenience in management.
In one embodiment of the present application, the related processing operations of the real-time processing application may be defined by SQL statements as follows. For example, the operation of creating the real-time processing application may be [ create application apps [ properties ("2") ] ], and the operation of processing data in real time may include: defining operations of a real-time data source on Kafka [ create stream sources ("source" ═ Kafka ", Kafka. zookeeper" ═ breaker: 2181 ", and topic" ═ source ") ]; defining a filtering operation of the real-time data: [ create stream ERROR logs estimate from source where message like "% ERROR" ]; and writing the filtered data into a result table: [ insert in result select from errors ] and the like.
Because the related processing operation of the real-time processing application needs to be defined by the SQL statement, in the server provided in the embodiment of the present application, the specific creation manner of the real-time processing application is as follows: when the server obtains a creation request of the real-time processing application, the real-time processing application is stored in a database in a persistent mode through metadata storage.
The creating request can come from the client device, and the user creates a real-time processing application with a specific function through the client device so as to complete a corresponding real-time processing task. Therefore, when receiving the creation request, the Server stores the relevant information of the real-time processing application in a database through a metadata store (MetaStore), specifically, the database may be a database using SQL such as MySQL. FIG. 5 illustrates a process flow for creating a real-time processing application, in one embodiment of the present application. When receiving the creation request, the server sends the creation request (create request) to the MetaServer, and then the MetaServer sends a corresponding write request (write request) to MySQL, thereby realizing the persistent storage.
Specifically, the relevant information of the real-time processing application may include the following fields: the specific table structure in MySQL of the real-time processing application, such as identification Information (ID), Name (Name), creation time (CreateTime), latest modification time (LastModifyTime), and corresponding SQL statement (Command) for executing tasks, is shown in table 1:
in addition, when creating the real-time processing application, the user specifies some configurations of the real-time processing application when executing, and table 2 shows a table structure of the real-time processing application configured in MySQL, which may include the following fields: identification Information (ID) of the real-time processing application, parameter KEY (PARAM _ KEY), parameter VALUE (PARAM _ VALUE), and the like.
In the actual processing process, the user may execute a real-time processing application, such as a [ start application app ], using the SQL statement by the Client device. After receiving the command for executing the real-time processing application, the real-time processing device 72 of the main server executes the process of the real-time processing application, which specifically includes the following processing steps:
first, the real-time processing unit 720 of the main server obtains the SQL statement of the real-time processing application. In the scenario described in the foregoing example, the relevant information of the real-time processing application is persistently saved in MySQL through MetaStore. At this time, in the process of acquiring the SQL statement, the real-time processing device 720 acquires the real-time processing application from the database through the metadata storage, that is, after receiving a command for executing the real-time processing application, sends a request to the MetaStore, so that the MetaStore acquires the relevant information about the real-time processing application from the MySQL data table, and then returns the information to the main server. Because the related information contains the SQL statement, the main server acquires the SQL statement of the operation for processing the data in real time in the real-time processing application. For example, the main server may finally obtain the SQL statement of the real-time processing application regarding the operation of writing the filtered data into the result table: [ insert in residual select from the fromerrrors ].
Then, the real-time processing unit 720 of the main server obtains an operator of an operation for performing real-time processing on the data according to the SQL statement. In an embodiment of the present application, the main server may parse SQL statements of a real-time processing application through an SQL Compiler (Compiler) to generate an Execution Plan (Execution Plan), which includes several operators: ROp is an operator to read data from Kafka, FOp is an operator to filter data in between, SOp is an operator to output final results.
Finally, the real-time processing means 720 of the main server submits the operator to a compute cluster, and the compute cluster executes the operator to implement the real-time processing operation on the data. For example, taking the scenario in the embodiment of the present application as an example, the main server submits the execution plan including the operator to Spark Cluster, and the execution plan is executed by an Executor executive in Spark Cluster, and a specific execution flow is shown in fig. 6.
To sum up, in the scheme provided by the application, if any one server acquires the synchronization lock when being started, the server becomes the main server and provides service to the outside; in the process that the main server provides the servers to the outside, if a fault occurs, the main server releases the synchronous lock to trigger the standby server to apply for the synchronous lock, so that the standby server can acquire the synchronous lock to become a new main server.
In addition, the scheme of the application defines the processing operation related to the real-time processing application by using the SQL statement, does not need to build a programming environment and does not depend on the SDK, simplifies the processes of configuration and modification, and has the advantage of convenient management.
It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.
In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Program instructions which invoke the methods of the present application may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware.

Claims (16)

1. A method of fault recovery for a real-time processing engine, wherein the method comprises:
when the synchronous lock is acquired, the synchronous lock becomes a main server;
when the real-time processing application is executed, a recording node is established in the coordination service system, and application information about the currently executed real-time processing application is written into the recording node, so that when the standby server becomes a main server, the application information about the currently executed real-time processing application is read from the recording node of the coordination service system, and the corresponding real-time processing application is continuously executed;
and when a fault occurs, releasing the synchronous lock to trigger the standby server to apply for the synchronous lock.
2. The method of claim 1, wherein the method further comprises:
when the synchronous lock is not acquired, the standby server is formed;
when the main server releases the synchronous lock, applying for the synchronous lock;
when the synchronous lock released by the main server is acquired through application, the main server becomes a new main server;
and acquiring the application information, and continuously executing the corresponding real-time processing application according to the application information.
3. The method according to claim 1 or 2, wherein the real-time processing application defines the processing operation of the real-time processing application by SQL statements.
4. The method of claim 3, wherein the processing operations for the real-time processing application comprise:
an operation of creating the real-time processing application;
and carrying out real-time processing operation on the data.
5. The method of claim 3, wherein the method further comprises:
and when a creation request of the real-time processing application is acquired, storing the real-time processing application into a database in a persistent mode through metadata storage.
6. The method of claim 3, wherein executing a real-time processing application comprises:
acquiring the SQL sentences of the real-time processing application;
acquiring an operational character of the operation for processing the data in real time according to the SQL statement;
submitting the operator to a computing cluster, and executing the operator by the computing cluster to realize real-time processing operation on data.
7. The method of claim 6, wherein obtaining the SQL statement for the real-time processing application comprises:
obtaining the real-time processing application from a database through metadata storage;
and acquiring SQL sentences of the operations for processing the data in real time in the real-time processing application.
8. The method of claim 1, wherein the method further comprises:
and when the real-time processing application stops executing, deleting the application information and the recording node corresponding to the real-time processing application in the coordination service system.
9. A fault recovery server for a real-time processing engine, wherein the server comprises:
the switching device is used for enabling the server to become a main server when the synchronous lock is acquired, and releasing the synchronous lock when a fault occurs so as to trigger the standby server to apply for the synchronous lock;
and the real-time processing device is used for creating a recording node in the coordination service system when the real-time processing application is executed, writing the application information of the currently executed real-time processing application into the recording node, so that the standby server reads the application information of the currently executed real-time processing application from the recording node of the coordination service system when the standby server becomes the main server, and continuously executing the corresponding real-time processing application.
10. The server according to claim 9, wherein the switching device is further configured to make the server a standby server when the synchronization lock is not acquired; when the main server releases the synchronous lock, applying for the synchronous lock; when the synchronous lock released by the main server is acquired through application, the server becomes a new main server;
and the real-time processing device is also used for acquiring the application information and continuously executing the corresponding real-time processing application according to the application information.
11. The server according to claim 9 or 10, wherein the real-time processing application defines the processing operation on the real-time processing application by SQL statements.
12. The server of claim 11, wherein the processing operations for the real-time processing application comprise:
an operation of creating the real-time processing application;
and carrying out real-time processing operation on the data.
13. The server of claim 11, wherein the real-time processing device is further configured to persist the real-time processing application to a database via a metadata store upon obtaining a creation request for the real-time processing application.
14. The server according to claim 11, wherein the real-time processing means is configured to obtain SQL statements of the real-time processing application; acquiring an operational character of the operation for processing the data in real time according to the SQL statement; and submitting the operator to a computing cluster, and executing the operator by the computing cluster to realize real-time processing operation on the data.
15. The server of claim 14, wherein the real-time processing means is configured to retrieve the real-time processing application from a database via a metadata store; and acquiring SQL sentences of the operations for processing the data in real time in the real-time processing application.
16. The server according to claim 9, wherein the real-time processing device is further configured to delete the application information and the recording node corresponding to the real-time processing application in the coordination service system when the real-time processing application stops executing.
CN201710002127.0A 2017-01-03 2017-01-03 Fault recovery method of real-time processing engine and corresponding server Active CN106649000B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710002127.0A CN106649000B (en) 2017-01-03 2017-01-03 Fault recovery method of real-time processing engine and corresponding server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710002127.0A CN106649000B (en) 2017-01-03 2017-01-03 Fault recovery method of real-time processing engine and corresponding server

Publications (2)

Publication Number Publication Date
CN106649000A CN106649000A (en) 2017-05-10
CN106649000B true CN106649000B (en) 2020-02-18

Family

ID=58838284

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710002127.0A Active CN106649000B (en) 2017-01-03 2017-01-03 Fault recovery method of real-time processing engine and corresponding server

Country Status (1)

Country Link
CN (1) CN106649000B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108959063A (en) * 2017-05-25 2018-12-07 北京京东尚科信息技术有限公司 A kind of method and apparatus that program executes
CN109344030A (en) * 2018-09-21 2019-02-15 四川长虹电器股份有限公司 The method of streaming fault data write-in processing
CN110445639B (en) * 2019-07-05 2022-07-12 视联动力信息技术股份有限公司 Hot standby method and device for server
CN111880909A (en) * 2020-07-27 2020-11-03 平安科技(深圳)有限公司 Distributed data publishing method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102546135A (en) * 2010-12-31 2012-07-04 富泰华工业(深圳)有限公司 System and method for switching between active and standby servers
US8296599B1 (en) * 2009-06-30 2012-10-23 Symantec Corporation System and method for implementing clustered network file system lock management
CN102868560A (en) * 2012-09-28 2013-01-09 南京恩瑞特实业有限公司 System and method for realizing hot standby of servers
CN103530200A (en) * 2012-07-04 2014-01-22 腾讯科技(深圳)有限公司 Server hot backup system and method
CN103618788A (en) * 2013-11-26 2014-03-05 曙光信息产业股份有限公司 System high-availability method supporting B/S structure
CN105573867A (en) * 2015-12-30 2016-05-11 浪潮(北京)电子信息产业有限公司 Method and system for realizing high availability of MySQL

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8296599B1 (en) * 2009-06-30 2012-10-23 Symantec Corporation System and method for implementing clustered network file system lock management
CN102546135A (en) * 2010-12-31 2012-07-04 富泰华工业(深圳)有限公司 System and method for switching between active and standby servers
CN103530200A (en) * 2012-07-04 2014-01-22 腾讯科技(深圳)有限公司 Server hot backup system and method
CN102868560A (en) * 2012-09-28 2013-01-09 南京恩瑞特实业有限公司 System and method for realizing hot standby of servers
CN103618788A (en) * 2013-11-26 2014-03-05 曙光信息产业股份有限公司 System high-availability method supporting B/S structure
CN105573867A (en) * 2015-12-30 2016-05-11 浪潮(北京)电子信息产业有限公司 Method and system for realizing high availability of MySQL

Also Published As

Publication number Publication date
CN106649000A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
US11455217B2 (en) Transaction consistency query support for replicated data from recovery log to external data stores
US9575871B2 (en) System and method for dynamically debugging data in a multi-tenant database environment
US7698602B2 (en) Systems, methods and computer products for trace capability per work unit
CN106649000B (en) Fault recovery method of real-time processing engine and corresponding server
US9129058B2 (en) Application monitoring through continuous record and replay
US10025839B2 (en) Database virtualization
US20160217159A1 (en) Database virtualization
US20190286474A1 (en) Concurrent queueing and control command feedback loop in unified automation platforms
WO2017045537A1 (en) Method and device for processing request in distributed system
CN107783842B (en) Distributed lock implementation method, device and storage device
CN109308170B (en) Data processing method and device
US10303678B2 (en) Application resiliency management using a database driver
CN111339118A (en) Kubernetes-based resource change history recording method and device
CN110674105A (en) Data backup method, system and server
CN113672350A (en) Application processing method and device and related equipment
CN106874343B (en) Data deletion method and system for time sequence database
CN111435327B (en) Log record processing method, device and system
Zhou et al. A runtime verification based trace-oriented monitoring framework for cloud systems
US20140149697A1 (en) Memory Pre-Allocation For Cleanup and Rollback Operations
CN107092671B (en) Method and equipment for managing meta information
CN115080309A (en) Data backup system, method, storage medium, and electronic device
CN113421109A (en) Service checking method, device, electronic equipment and storage medium
KR20130067959A (en) Method and appratus for processing error of application
CN110908821A (en) Method, device, equipment and storage medium for task failure management
CN110597603A (en) Scheduling method and system of distributed scheduling tasks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 200233 11-12 / F, building B, 88 Hongcao Road, Xuhui District, Shanghai

Patentee after: Star link information technology (Shanghai) Co.,Ltd.

Address before: 200233 11-12 / F, building B, 88 Hongcao Road, Xuhui District, Shanghai

Patentee before: TRANSWARP TECHNOLOGY (SHANGHAI) Co.,Ltd.

CP01 Change in the name or title of a patent holder
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Fault recovery methods and corresponding servers for real-time processing engines

Effective date of registration: 20230616

Granted publication date: 20200218

Pledgee: Bank of China Limited by Share Ltd. Shanghai Xuhui branch

Pledgor: Star link information technology (Shanghai) Co.,Ltd.

Registration number: Y2023310000252

PE01 Entry into force of the registration of the contract for pledge of patent right