CN106649000A - Fault recovery method for real-time processing engine, and corresponding server - Google Patents

Fault recovery method for real-time processing engine, and corresponding server Download PDF

Info

Publication number
CN106649000A
CN106649000A CN201710002127.0A CN201710002127A CN106649000A CN 106649000 A CN106649000 A CN 106649000A CN 201710002127 A CN201710002127 A CN 201710002127A CN 106649000 A CN106649000 A CN 106649000A
Authority
CN
China
Prior art keywords
real
time processing
application
processing application
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710002127.0A
Other languages
Chinese (zh)
Other versions
CN106649000B (en
Inventor
季钱飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Transwarp Technology Shanghai Co Ltd
Original Assignee
Star Link Information Technology (shanghai) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Star Link Information Technology (shanghai) Co Ltd filed Critical Star Link Information Technology (shanghai) Co Ltd
Priority to CN201710002127.0A priority Critical patent/CN106649000B/en
Publication of CN106649000A publication Critical patent/CN106649000A/en
Application granted granted Critical
Publication of CN106649000B publication Critical patent/CN106649000B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2082Data synchronisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Hardware Redundancy (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention aims to provide a fault recovery scheme for a real-time processing engine. According to the scheme, if any server obtains a synchronous lock when being started, the server becomes a main server and provides services for outside; in the process that the main server provides the services for the outside, if a fault occurs, the main server releases the synchronous lock to trigger a standby server to apply for the synchronous lock, so that the standby server can obtain the synchronous lock and becomes a new main server; and in addition, when executing a real-time processing application, the main server records application information about the currently executed real-time processing application, so that the standby server continues to execute the corresponding real-time processing application by obtaining the application information when becoming the main server, and the fault is automatically recovered.

Description

Real-time processing engine failure restoration methods and corresponding server
Technical field
The application is related to areas of information technology, more particularly to a kind of automatic fault recovery technology of real-time processing engine.
Background technology
With the development of big data technology, the data volume more massive than ever that enterprise can store and process reaches TB even PB ranks.At present enterprise realizes mainly off-line analysiss business, generation of this kind of business from data in mass data Generation to result generally requires the T+1 even longer cycles.For to much very high to requirement of real-time industry, this is not Their business need can be met.How quickly processing data, more in real time feedback result be big data field be badly in need of solution Problem certainly.
The appearing as of real-time processing engine meet enterprise process in real time big data demand provide may, by real-time Engine is processed, enterprise can be helped to carry out ETL, real-time statement analysiss, the even work such as real-time machine study.Lead on the market at present The several distributed real-time processing engine of stream includes:Apache Flink, Spark Streaming etc., user is drawn by these Holding up the api interface of offer can realize real-time demand business.
Real-time processing and traditional batch processing services are very different, and most important of which is some real-time processing business Process be the data, i.e. data without border when endlessly, client usually requires that real-time processing business 7 × 24 is uninterrupted Operation.But because distributed system all can be because of a variety of as network, hardware fault reason cause to stop service, in this feelings Discovery in time is required under condition, it is ensured that data are not lost, and fault recovery is carried out in the most short time.At present such as Apache These main flow real-time processing engines such as Flink and Spark Streaming, although be all provided with the reliability that mechanism ensures data Property, but the automatic fault for all not providing complete set recovers service.
Application content
One purpose of the application is to provide a kind of automatic fault recovery technology of real-time processing engine.
For achieving the above object, this application provides a kind of real-time processing engine failure restoration methods, the method includes:
When synchrolock is got, become master server;
When real-time processing application is performed, the application message with regard to the current real-time processing application for performing is recorded, so that standby Server continues executing with corresponding real-time processing application when master server is become by obtaining the application message;
When breaking down, the synchrolock is discharged, to trigger synchrolock described in standby server application.
Further, the method also includes:
When the synchrolock is not got, become standby server;
When master server discharges synchrolock, apply for the synchrolock;
When the synchrolock of master server release is got by application, become new master server;
The application message is obtained, and corresponding real-time processing application is continued executing with according to the application message.
Further, in the real-time processing application, the process behaviour with regard to the real-time processing application is defined by SQL statement Make.
Further, the process with regard to the real-time processing application is operated, including:
Create the operation of the real-time processing application;
The operation of real-time processing is carried out to data.
Further, the method also includes:
When the request to create of real-time processing application is obtained, stored the real-time processing application persistence by metadata Preserve into data base.
Further, real-time processing application is performed, including:
Obtain the SQL statement of the real-time processing application;
The operator of the operation that data are carried out with real-time processing is obtained according to the SQL statement;
The operator is committed to into computing cluster, the operator is performed by the computing cluster, to realize to data Real-time processing operation.
Further, the SQL statement of the real-time processing application is obtained, including:
The real-time processing application is obtained from data base by metadata storage;
Obtain the SQL statement of the operation for being used to carry out data real-time processing in the real-time processing application.
Further, the application message of real-time processing application of the record with regard to currently performing, including:
Record node is created in coordination service system, is write with regard to the current real-time place for performing in the record node Ought to application message.
Further, the method also includes:
When stopping performing the real-time processing application, real-time processing application described in coordination service system is deleted corresponding Application message and record node.
Further, the application message is obtained, including:
From the record node of coordination service system, the application message with regard to the current real-time processing application for performing is read.
Based on the another aspect of the application, additionally provide a kind of real-time processing engine failure and recover server, the service Device includes:
Switching device, for when synchrolock is got, making book server become master server, and is breaking down When, the synchrolock is discharged, to trigger synchrolock described in standby server application;
Real-time processing device, for when real-time processing application is performed, recording with regard to the current real-time processing application for performing Application message so that standby server is when master server is become, by obtaining the application message corresponding reality is continued executing with When process application.
Further, the switching device, is additionally operable to, when the synchrolock is not got, make book server become standby clothes Business device;When master server discharges synchrolock, apply for the synchrolock;And master server release is being got by application During synchrolock, book server is set to become new master server;
The real-time processing device, is additionally operable to obtain the application message, and continued executing with according to the application message it is right The real-time processing application answered.
Further, in the real-time processing application, the process behaviour with regard to the real-time processing application is defined by SQL statement Make.
Further, the process with regard to the real-time processing application is operated, including:
Create the operation of the real-time processing application;
The operation of real-time processing is carried out to data.
Further, the real-time processing device, is additionally operable to when the request to create of real-time processing application is obtained, by unit Data storage preserves the real-time processing application persistence into data base.
Further, the real-time processing device, for obtaining the SQL statement of the real-time processing application;According to described SQL statement obtains the operator of the operation that data are carried out with real-time processing;And the operator is committed to into computing cluster, by The computing cluster performs the operator, to realize the real-time processing operation to data.
Further, the real-time processing device, for obtaining the real-time processing from data base by metadata storage Using;And obtain in the real-time processing application for data to be carried out with the SQL statement of the operation of real-time processing.
Further, the real-time processing device, for creating record node in coordination service system, to the record The application message with regard to the current real-time processing application for performing is write in node.
Further, the real-time processing device, is additionally operable to, when stopping performing the real-time processing application, delete and coordinate The corresponding application message of real-time processing application described in service system and record node.
Further, the real-time processing device, for from the record node of coordination service system, reading with regard to current The application message of the real-time processing application of execution.
Compared with prior art, in the scheme that the application is provided, if any one server gets on startup synchronization Lock, then become master server and externally provide service;It is main if breaking down during master server externally provides server Server will discharge the synchrolock, to trigger synchrolock described in standby server application so that standby server can get together Lock is walked so as to become new master server, additionally, master server is when real-time processing application is performed, is recorded with regard to current execution The application message of real-time processing application, so that standby server is when master server is become, is continued by obtaining the application message Corresponding real-time processing application is performed, so as to realize the automatic recovery of failure.
Description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, the application other Feature, objects and advantages will become more apparent upon:
Fig. 1 is to realize the schematic diagram that active-standby mode starts based on Zookeeper in the embodiment of the present application;
Fig. 2 is the data flow diagram of real time processing tasks in the embodiment of the present application;
Fig. 3 is the schematic diagram for carrying out real-time processing application state tracking in the embodiment of the present application using Zookeeper;
Fig. 4 is the schematic diagram for realizing fault recovery in the embodiment of the present application based on Zookeeper;
Fig. 5 is the process chart that server creates real-time processing application in the embodiment of the present application;
Fig. 6 is the process chart that master server performs real-time processing application in the embodiment of the present application;
Fig. 7 is the schematic diagram that a kind of real-time processing engine failure provided in the embodiment of the present application recovers server;
Same or analogous reference represents same or analogous part in accompanying drawing.
Specific embodiment
The application is described in further detail below in conjunction with the accompanying drawings.
In one typical configuration of the application, terminal, the equipment of service network include one or more processors (CPU), input/output interface, network interface and internal memory.
Internal memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read only memory (ROM) or flash memory (flash RAM).Internal memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media, can be by any side Method or technology are realizing information Store.Information can be computer-readable instruction, data structure, the module of program or other numbers According to.The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read only memory (ROM), Electrically Erasable Read Only Memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc (CD- ROM), digital versatile disc (DVD) or other optical storages, magnetic cassette tape, magnetic disk storage or other magnetic storages Equipment or any other non-transmission medium, can be used to store the information that can be accessed by a computing device.
The embodiment of the present application provides a kind of real-time processing engine failure restoration methods, the real-time place that the method is suitable for Multiple servers are included in reason engine, during real-time processing engine start, multiple servers can simultaneously start application synchrolock (Lock)。
For any one server (Server), when synchrolock is got, become master server (Active Server).The master server is obtained in that the resources use right of computing cluster, externally provides service, i.e. real-time processing engine Data processing function, realized by calling the resource of computing cluster by master server;And correspondingly, if server is not obtained The synchrolock is got, then will become standby server (Standby Server), the standby server does not obtain computing cluster Resources use right, does not externally provide service.
In actual scene, the server can apply for synchrolock, and the calculating to coordination service system Zookeeper Cluster can be Spark Cluster.When server obtains the resources use right of computing cluster, Spark can be passed through The executor Executor that Cluster is provided completing corresponding real time processing tasks, it is concrete as shown in Figure 1.Real-time processing is drawn When holding up just startup, two servers start simultaneously, and go application one synchrolock to Zookeeper, obtain the service of synchrolock Device becomes Active Server, and starts offer service, and obtains the resources use right of Spark Cluster, and another is not Getting the server of synchrolock then becomes Standby Server.Here, it will be appreciated by those skilled in the art that illustrating in figure The quantity of each dvielement be likely less than the quantity of respective element in actual scene (quantity of such as server can be more than two Individual, all servers for not getting synchrolock will all become Standby Server), but this omission is far and away with will not Impact is carried out premised on clear, sufficient disclosure to the present invention.
Master server performs real-time processing application when service is externally provided by using the resource of computing cluster (application) real time processing tasks of data are completed.The data processing task basic for one, at least includes:Definition Data from data source are analyzed process by real time data source, and result is exported in specified storage.Fig. 2 is this The data flow diagram of real time processing tasks involved by one embodiment of application, wherein, message system is subscribed to using distributed post Kafka reads real time data as data source, from some data source partition (Partition) of kafka, to real time data Analyzing and processing predominantly wherein wrong log information (log information comprising ERROR) is written to into the result table of data base In.Specifically, master server reads data by starting receptor Receiver from kafka, then by filter Filter filters the log information not comprising ERROR, and the data after filtration are write into data base by Sink operations In the result table of Database.
In actual scene, when master server breaks down, it will the release synchrolock, to trigger standby server Shen Please the synchrolock.Now, for standby server, when master server discharges synchrolock, apply for the synchrolock, and passing through When application gets the synchrolock of master server release, become new master server, be achieved in oneself between active/standby server Dynamic switching.
Thus, also can continue to be smoothed out in active-standby switch to ensure real time processing tasks, the embodiment of the present application In the scheme of offer, when real-time processing application is performed, master server can be recorded with regard to the current real-time processing application for performing Application message, so that standby server is when master server is become, is continued executing with corresponding real-time by the acquisition application message Process application.I.e. for server when new master server is become, it will obtain the application message, and believed according to the application Breath continues executing with corresponding real-time processing application.
In one embodiment of the application, master server can using coordination service system (such as Zookeeper) come The application message with regard to the current real-time processing application for performing is recorded, concrete processing procedure includes:Create in coordination service system Record node is built, the application message with regard to the current real-time processing application for performing then is write in the record node.Fig. 3 shows Having gone out a kind of utilization Zookeeper carries out the schematic diagram of real-time processing application state tracking, wherein, when real-time processing application starts Afterwards, master server can follow the trail of the state of real-time processing application, and the real-time processing application of current operation is recorded in internal memory Application message, then creates record node [/Running/app], to record application message in Zookeeper.
Further, when stopping performing the real-time processing application, master server can delete institute in coordination service system State the corresponding application message of real-time processing application and record node.For the example shown in Fig. 3, when real-time processing application stops During execution, master server can be deleted with regard to the application message of the real-time processing application in internal memory, and can be by correspondence on Zookeeper Record node [/Running/app] delete.
Correspondingly, standby server, when new master server is become, is also from the record node of coordination service system, to read Take the application message with regard to the current real-time processing application for performing.By being answered by the current real-time processing for performing of master server record Application message, and the application message is read when new master server is become for server, thus continue executing with correlation The mode of real-time processing application, can realize the mechanism of complete automatically restoring fault.Fig. 4 shows the side of the embodiment of the present application Case realizes the principle of fault recovery, when main server-a ctiver Server break down, standby server S tandby Server can obtain synchrolock from coordination service system Zookeeper, thus become new Active Server, now It will obtain computing cluster Spark Cluster resources use right, and from Zookeeper read record node [/ Running/app] in application message, that is, original master server when breaking down also in the real-time processing application of operation Relevant information.Then, the real-time processing application is resubmited Spark Cluster and is performed by new Active Server, To complete the real time processing tasks not completed before, the automatic recovery of failure is realized.
Further, in the real-time processing application processed for realizing data to implement of the embodiment of the present application, by SQL languages Sentence definition is operated with regard to the process of the real-time processing application.Specifically, these are operated with regard to the process of the real-time processing application, can With including creating the operation of the real-time processing application, and data are carried out with the operation of real-time processing etc..Compared to existing skill In art, such as real-time processing engine such as Apache Flink, Spark Streaming is needed by writing Java, Scala etc. Defining aforesaid operations, user needs from the beginning to open application code when definition is operated with regard to the process of real-time processing application Beginning build programmed environment, obtain rely on SDK, pack, be deployed to cluster and carry out test and use, need to be familiar with various API, distribution The logic of formula system is very complicated and poorly efficient.And the process behaviour with regard to real-time processing application is defined using SQL statement Make, without the need for building programmed environment, being independent of SDK, simplify configuration, the process of modification, there is convenient management.
In one embodiment of the application, can be defined at the correlation of real-time processing application by following SQL statement Reason operation.For example, the operation of the real-time processing application is created, can is [create application app [properties (" parallelism "=" 2 ")]], and the operation that data are carried out with real-time processing can include:Definition Operation [create stream source (id int, the name string, message in the real time data source on Kafka String) streamproperties (" source "=" kafka ", kafka.zookeeper "=" broker:2181”, " topic "=" source ")];Define the filter operation of real time data:[create stream errorLogs as Select*from source where message like " %ERROR% "];And the data after filtration are write into result The operation of table:[insert into result select*from errorlogs] etc..
Because the relevant treatment operations of real-time processing application need to be defined by SQL statement, therefore the embodiment of the present application institute In the method for offer, the concrete establishment mode of real-time processing application is as follows:Server please in the establishment for obtaining real-time processing application When asking, the real-time processing application persistence is preserved into data base by metadata storage.
Wherein, the request to create may come from client device, and user creates specific function by client device Real-time processing application, to complete corresponding real time processing tasks.Therefore, server S erver is when request to create is received, meeting The relevant information of real-time processing application is preserved into data base by metadata storage (MetaStore), specifically, the number Can be the data base using sql like language such as MySQL according to storehouse.Fig. 5 shows and create in one embodiment of the application place in real time Ought to handling process.When request to create is received, request to create (create request) is sent to MetaStore, Then MetaStore sends corresponding write request (write request) to MySQL again, so as to realize the preservation of persistence.
Specifically, the relevant information of real-time processing application can include following field:The identification information of real-time processing application (ID), title (Name), creation time (CreateTime), nearest modification time (LastModifyTime) and corresponding hold SQL statement (Command) of row task etc., its tool cuticle topography in MySQL is as shown in table 1:
Field Type Whether it is major key
ID Bigint(20) It is
Name Varchar(128)
CreateTime Int(11)
LastModifyTime Int(11)
Command mediumtext
Table 1
In addition, when real-time processing application is created, user can also specify the real-time processing application upon execution Some configurations, table 2 shows a kind of table structure being configured in MySQL of real-time processing application, can include following field: Identification information (ID), parameter key (PARAM_KEY), parameter value (PARAM_VALUE) of real-time processing application etc..
Field Type Whether it is major key
APP_ID Bigint(20) It is
PARAM_KEY Varchar(128) It is
PARAM_VALUE Varchar(4000)
Table 2
In actual process, user can perform real-time processing by client device Client using SQL statement Using such as [start application app].Master server after the order for receiving the execution real-time processing application, It is performing the process of real-time processing application, specifically includes following process step:
First, master server obtains the SQL statement of the real-time processing application.It is described in the scene as described in precedent The relevant information of real-time processing application is stored in MySQL by MetaStore persistences.Now, master server is obtaining SQL languages In the processing procedure of sentence, metadata storage will be first passed through and obtain the real-time processing application from data base, that is, receiving execution After the order of real-time processing application, send to MetaStore and ask so that MetaStore is obtained again from the tables of data of MySQL With regard to the relevant information of real-time processing application, master server is then returned to.Due to containing SQL statement in relevant information, make Master server to obtain the real-time processing application in for data are carried out real-time processing operation SQL statement.For example, Master server may finally get the SQL languages of the operation with regard to the data after filtration to be write result table of real-time processing application Sentence:[insert into result select*from errorlogs].
Then, master server obtains the operator of the operation that data are carried out with real-time processing according to the SQL statement.At this In one embodiment of application, master server can pass through SQL compilers (Compiler) by the SQL statement of real-time processing application Parsing generates an implement plan (Execution Plan), and the implement plan includes several operators:ROp is from Kafka Read the operator of data, FOp be the operator of intermediate filtered data, SOp be final result output operator.
Finally, the operator is committed to computing cluster by master server, and by the computing cluster operator is performed, To realize the real-time processing operation to data.For example, by taking the scene in the embodiment of the present application as an example, master server will be comprising operation The implement plan of symbol is committed to Spark Cluster, is performed by the executor Executor in Spark Cluster, tool It is as shown in Figure 6 that body performs flow process.
Based on same inventive concept, real-time processing engine failure is additionally provided in the embodiment of the present application and recovers server, The corresponding method of fault recovery server is the real-time processing engine failure restoration methods in previous embodiment, and it is solved Certainly the principle of problem is similar to methods described.
Fig. 7 shows that a kind of real-time processing engine failure that the embodiment of the present application is provided recovers server, including switching Device 710 and real-time processing device 720.Multiple above-mentioned servers are included in real-time processing engine, during real-time processing engine start, Multiple servers can simultaneously start application synchrolock.
For any one server (Server), its switching device 710 are used for when synchrolock is got, this service is made Device becomes master server (Active Server).The master server is obtained in that the resources use right of computing cluster, externally carries For service, the i.e. data processing function of real-time processing engine, realized by calling the resource of computing cluster by master server; And correspondingly, switching device 710 is used for when the synchrolock is not got, book server is set to become standby server (Standby Server), the standby server does not obtain the resources use right of computing cluster, does not externally provide service.
In actual scene, the server can apply for synchrolock, and the calculating to coordination service system Zookeeper Cluster can be Spark Cluster.When server obtains the resources use right of computing cluster, Spark can be passed through The executor Executor that Cluster is provided completing corresponding real time processing tasks, it is concrete as shown in Figure 1.Real-time processing is drawn When holding up just startup, two servers start simultaneously, and go application one synchrolock to Zookeeper, obtain the service of synchrolock Device becomes Active Server, and starts offer service, and obtains the resources use right of Spark Cluster, and another is not Getting the server of synchrolock then becomes Standby Server.Here, it will be appreciated by those skilled in the art that illustrating in figure The quantity of each dvielement be likely less than the quantity of respective element in actual scene (quantity of such as server can be more than two Individual, all servers for not getting synchrolock will all become Standby Server), but this omission is far and away with will not Impact is carried out premised on clear, sufficient disclosure to the present invention.
Master server performs real-time processing application and completes number when service is externally provided by using the resource of computing cluster According to real time processing tasks.The data processing task basic for one, at least includes:Real time data source is defined, to from number Process is analyzed according to the data in source, result is exported in specified storage.Fig. 2 is involved by one embodiment of the application And the data flow diagram of real time processing tasks, wherein, message system kafka is subscribed to as data source using distributed post, from The message queue of kafka reads real time data, and the analyzing and processing of real time data predominantly (is wrapped wherein wrong log information Log information containing ERROR) it is written to the result table of data base.Specifically, master server is by starting receptor Receiver reads data from kafka, then filters the log information not comprising ERROR by filter F ilter, and will Data after filtration are write in the result table of database D atabase by Sink operations.
In actual scene, when master server breaks down, its switching device 710 will discharge the synchrolock, with Synchrolock described in the standby server application of triggering.Now, for standby server, its switching device 710 can be same in master server release During step lock, apply for the synchrolock, and when the synchrolock of master server release is got by application, become book server New master server, the automatic switchover being achieved between active/standby server.
Thus, also can continue to be smoothed out in active-standby switch to ensure real time processing tasks, the embodiment of the present application In the scheme of offer, when real-time processing application is performed, the real-time processing device 720 of master server can be recorded with regard to current execution Real-time processing application application message so that standby server is when master server is become, by obtain the application message after It is continuous to perform corresponding real-time processing application.I.e. for server when new master server is become, its real-time processing device 720 will The application message is obtained, and corresponding real-time processing application is continued executing with according to the application message.
In one embodiment of the application, master server can using coordination service system (such as Zookeeper) come The application message with regard to the current real-time processing application for performing is recorded, the concrete processing procedure of its real-time processing device 720 includes: Record node is created in coordination service system, then writing in the record node should with regard to the current real-time processing for performing Application message.Fig. 3 shows that a kind of utilization Zookeeper carries out the schematic diagram of real-time processing application state tracking, wherein, After real-time processing application starts, master server can follow the trail of the state of real-time processing application, and current fortune is recorded in internal memory The application message of capable real-time processing application, then create in Zookeeper (create) one record node [/ Running/app], to record application message.
Further, when stopping performing the real-time processing application, the real-time processing device 720 of master server can be deleted The corresponding application message of real-time processing application described in coordination service system and record node.For the example shown in Fig. 3, when When real-time processing application stops performing, master server can be deleted in internal memory with regard to the application message of the real-time processing application, and meeting Corresponding record node [/Running/app] on Zookeeper is deleted into (Remove).
Correspondingly, when new master server is become, its real-time processing device 720 is also from coordination service system to standby server In the record node of system, the application message with regard to the current real-time processing application for performing is read.By being worked as by master server record The application message of the real-time processing application of front execution, and standby server is when new master server is become, and reads the application letter Breath, thus continues executing with the mode of related real-time processing application, can realize the mechanism of complete automatically restoring fault.Fig. 4 shows The scheme for having gone out the embodiment of the present application realizes the principle of fault recovery, when main server-a ctiver Server break down, Standby server S tandby Server can obtain synchrolock from coordination service system Zookeeper, thus become new Active Server, now it will obtain the resources use right of computing cluster Spark Cluster, and read from Zookeeper Take the application message in record node [/Running/app], that is, original master server when breaking down also in operation The relevant information of real-time processing application.Then, the real-time processing application is resubmited Spark by new Active Server Cluster is performed, and to complete the real time processing tasks not completed before, realizes the automatic recovery of failure.
Further, in the real-time processing application processed for realizing data to implement of the embodiment of the present application, by SQL languages Sentence definition is operated with regard to the process of the real-time processing application.Specifically, these are operated with regard to the process of the real-time processing application, can With including creating the operation of the real-time processing application, and data are carried out with the operation of real-time processing etc..Compared to existing skill In art, such as real-time processing engine such as Apache Flink, Spark Streaming is needed by writing Java, Scala etc. Defining aforesaid operations, user needs from the beginning to open application code when definition is operated with regard to the process of real-time processing application Beginning build programmed environment, obtain rely on SDK, pack, be deployed to cluster and carry out test and use, need to be familiar with various API, distribution The logic of formula system is very complicated and poorly efficient.And the process behaviour with regard to real-time processing application is defined using SQL statement Make, without the need for building programmed environment, being independent of SDK, simplify configuration, the process of modification, there is convenient management.
In one embodiment of the application, can be defined at the correlation of real-time processing application by following SQL statement Reason operation.For example, the operation of the real-time processing application is created, can is [create application app [properties (" parallelism "=" 2 ")]], and the operation that data are carried out with real-time processing can include:Definition Operation [create stream source (id int, the name string, message in the real time data source on Kafka String) streamproperties (" source "=" kafka ", kafka.zookeeper "=" broker:2181”, " topic "=" source ")];Define the filter operation of real time data:[create stream errorLogs as Select*from source where message like " %ERROR% "];And the data after filtration are write into result The operation of table:[insert into result select*from errorlogs] etc..
Because the relevant treatment operations of real-time processing application need to be defined by SQL statement, therefore the embodiment of the present application institute In the server of offer, the concrete establishment mode of real-time processing application is as follows:Server is obtaining the establishment of real-time processing application During request, the real-time processing application persistence is preserved into data base by metadata storage.
Wherein, the request to create may come from client device, and user creates specific function by client device Real-time processing application, to complete corresponding real time processing tasks.Therefore, server S erver is when request to create is received, meeting The relevant information of real-time processing application is preserved into data base by metadata storage (MetaStore), specifically, the number Can be the data base using sql like language such as MySQL according to storehouse.Fig. 5 shows in one embodiment of the application, creates and locate in real time Ought to handling process.Server when request to create is received, by request to create (create request) send to MetaStore, then MetaStore send corresponding write request (write request) to MySQL again, so as to realize holding The preservation of longization.
Specifically, the relevant information of real-time processing application can include following field:The identification information of real-time processing application (ID), title (Name), creation time (CreateTime), nearest modification time (LastModifyTime) and corresponding hold SQL statement (Command) of row task etc., its tool cuticle topography in MySQL is as shown in table 1:
In addition, when real-time processing application is created, user can also specify the real-time processing application upon execution Some configurations, table 2 shows a kind of table structure being configured in MySQL of real-time processing application, can include following field: Identification information (ID), parameter key (PARAM_KEY), parameter value (PARAM_VALUE) of real-time processing application etc..
In actual process, user can perform real-time processing by client device Client using SQL statement Using such as [start application app].Master server after the order for receiving the execution real-time processing application, Its real-time processing device 72 performs the process of real-time processing application, specifically includes following process step:
First, the real-time processing device 720 of master server obtains the SQL statement of the real-time processing application.In such as precedent In described scene, the relevant information of the real-time processing application is stored in MySQL by MetaStore persistences.Now, it is main In the processing procedure for obtaining SQL statement, its real-time processing device 720 will first pass through metadata storage and obtain from data base server The real-time processing application is taken, i.e., after the order for performing real-time processing application is received, is sent to MetaStore and is asked, made MetaStore obtains again relevant information with regard to real-time processing application from the tables of data of MySQL, be then returned to main service Device.Due to containing SQL statement in relevant information so that master server to obtain the real-time processing application in be used for data Carry out the SQL statement of the operation of real-time processing.For example, master server may finally get real-time processing application with regard to inciting somebody to action The SQL statement of the operation of the data write result table after filter:[insert into result select*from errorlogs]。
Then, the real-time processing device 720 of master server is obtained according to the SQL statement and carries out real-time processing to data The operator of operation.In one embodiment of the application, master server can be by SQL compilers (Compiler) by real time The SQL statement parsing for processing application generates an implement plan (Execution Plan), and the implement plan includes several operations Symbol:ROp be read from Kafka the operator of data, FOp be the operator of intermediate filtered data, SOp be final result output Operator.
Finally, the operator is committed to computing cluster by the real-time processing device 720 of master server, is collected by described calculating Group performs the operator, to realize the real-time processing operation to data.For example, by taking the scene in the embodiment of the present application as an example, Implement plan comprising operator is committed to Spark Cluster by master server, by the executor in Spark Cluster Executor is performed, and concrete execution flow process is as shown in Figure 6.
In sum, in the scheme that the application is provided, if any one server gets on startup synchrolock, into Service is externally provided for master server;During master server externally provides server, if breaking down, master server will The synchrolock is discharged, to trigger synchrolock described in standby server application so that standby server can get synchrolock so as to Become new master server, additionally, master server is when real-time processing application is performed, record with regard to the current real-time processing for performing Using application message so that standby server is when master server is become, by obtaining the application message correspondence is continued executing with Real-time processing application, so as to realize the automatic recovery of failure.
Additionally, the scheme of the application defines the process operation with regard to real-time processing application using SQL statement, without the need for building Programmed environment, SDK is independent of, simplifies configuration, the process of modification, there is convenient management.
It should be noted that the application can be carried out in the assembly of software and/or software with hardware, for example, can adopt Realized with special IC (ASIC), general purpose computer or any other similar hardware device.In one embodiment In, the software program of the application can pass through computing device to realize steps described above or function.Similarly, the application Software program (including related data structure) can be stored in computer readable recording medium storing program for performing, for example, RAM memory, Magnetically or optically driver or floppy disc and similar devices.In addition, some steps or function of the application can employ hardware to realize, example Such as, as coordinating so as to perform the circuit of each step or function with processor.
In addition, the part of the application can be applied to computer program, such as computer program instructions, when its quilt When computer is performed, by the operation of the computer, can call or provide according to the present processes and/or technical scheme. And the programmed instruction of the present processes is called, in being possibly stored in fixed or moveable recording medium, and/or pass through Data flow in broadcast or other signal bearing medias and be transmitted, and/or be stored according to described program instruction operation In the working storage of computer equipment.Here, including a device according to one embodiment of the application, the device includes using In the memorizer and the processor for execute program instructions of storage computer program instructions, wherein, when the computer program refers to When order is by the computing device, method and/or skill of the plant running based on aforementioned multiple embodiments according to the application is triggered Art scheme.
It is obvious to a person skilled in the art that the application is not limited to the details of above-mentioned one exemplary embodiment, Er Qie In the case of without departing substantially from spirit herein or basic feature, the application can be in other specific forms realized.Therefore, no matter From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, scope of the present application is by appended power Profit is required rather than described above is limited, it is intended that all in the implication and scope of the equivalency of claim by falling Change is included in the application.Any reference in claim should not be considered as and limit involved claim.This Outward, it is clear that " including ", a word was not excluded for other units or step, and odd number is not excluded for plural number.That what is stated in device claim is multiple Unit or device can also be realized by a unit or device by software or hardware.

Claims (20)

1. a kind of real-time processing engine failure restoration methods, wherein, the method includes:
When synchrolock is got, become master server;
When real-time processing application is performed, the application message with regard to the current real-time processing application for performing is recorded, so that standby service Device continues executing with corresponding real-time processing application when master server is become by obtaining the application message;
When breaking down, the synchrolock is discharged, to trigger synchrolock described in standby server application.
2. method according to claim 1, wherein, the method also includes:
When the synchrolock is not got, become standby server;
When master server discharges synchrolock, apply for the synchrolock;
When the synchrolock of master server release is got by application, become new master server;
The application message is obtained, and corresponding real-time processing application is continued executing with according to the application message.
3. method according to claim 1 and 2, wherein, in the real-time processing application, by SQL statement definition with regard to The process operation of the real-time processing application.
4. method according to claim 3, wherein, the process with regard to the real-time processing application is operated, including:
Create the operation of the real-time processing application;
The operation of real-time processing is carried out to data.
5. method according to claim 3, wherein, the method also includes:
When the request to create of real-time processing application is obtained, the real-time processing application persistence is preserved by metadata storage Into data base.
6. method according to claim 3, wherein, real-time processing application is performed, including:
Obtain the SQL statement of the real-time processing application;
The operator of the operation that data are carried out with real-time processing is obtained according to the SQL statement;
The operator is committed to into computing cluster, the operator is performed by the computing cluster, to realize the reality to data When process operation.
7. method according to claim 6, wherein, the SQL statement of the real-time processing application is obtained, including:
The real-time processing application is obtained from data base by metadata storage;
Obtain the SQL statement of the operation for being used to carry out data real-time processing in the real-time processing application.
8. method according to claim 3, wherein, the application message with regard to the current real-time processing application for performing is recorded, Including:
Record node is created in coordination service system, writing in the record node should with regard to the current real-time processing for performing Application message.
9. method according to claim 8, wherein, the method also includes:
When stopping performing the real-time processing application, the corresponding application of real-time processing application described in coordination service system is deleted Information and record node.
10. method according to claim 3, wherein, the application message is obtained, including:
From the record node of coordination service system, the application message with regard to the current real-time processing application for performing is read.
A kind of 11. real-time processing engine failures recover server, wherein, the server includes:
Switching device, for when synchrolock is got, making book server become master server, and when breaking down, releases The synchrolock is put, to trigger synchrolock described in standby server application;
Real-time processing device, for when real-time processing application is performed, recording answering with regard to the current real-time processing application for performing With information, so that standby server is when master server is become, by obtaining the application message corresponding real-time place is continued executing with Ought to use.
12. servers according to claim 11, wherein, the switching device is additionally operable to do not getting the synchronization During lock, book server is set to become standby server;When master server discharges synchrolock, apply for the synchrolock;And passing through When application gets the synchrolock of master server release, book server is set to become new master server;
The real-time processing device, is additionally operable to obtain the application message, and is continued executing with according to the application message corresponding Real-time processing application.
13. servers according to claim 11 or 12, wherein, in the real-time processing application, defined by SQL statement Process with regard to the real-time processing application is operated.
14. servers according to claim 13, wherein, the process with regard to the real-time processing application is operated, including:
Create the operation of the real-time processing application;
The operation of real-time processing is carried out to data.
15. servers according to claim 13, wherein, the real-time processing device is additionally operable to obtaining real-time processing Using request to create when, by metadata storage the real-time processing application persistence is preserved into data base.
16. servers according to claim 13, wherein, the real-time processing device, for obtaining the real-time processing Using SQL statement;The operator of the operation that data are carried out with real-time processing is obtained according to the SQL statement;And will be described Operator is committed to computing cluster, and by the computing cluster operator is performed, to realize the real-time processing operation to data.
17. servers according to claim 16, wherein, the real-time processing device, for by metadata storage from Data base obtains the real-time processing application;And obtain in the real-time processing application for carrying out real-time processing to data The SQL statement of operation.
18. servers according to claim 13, wherein, the real-time processing device, in coordination service system Record node is created, the application message with regard to the current real-time processing application for performing is write in the record node.
19. servers according to claim 18, wherein, the real-time processing device is additionally operable to stopping described in execution During real-time processing application, the corresponding application message of real-time processing application described in coordination service system and record node are deleted.
20. servers according to claim 13, wherein, the real-time processing device, for from coordination service system In record node, the application message with regard to the current real-time processing application for performing is read.
CN201710002127.0A 2017-01-03 2017-01-03 Fault recovery method of real-time processing engine and corresponding server Active CN106649000B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710002127.0A CN106649000B (en) 2017-01-03 2017-01-03 Fault recovery method of real-time processing engine and corresponding server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710002127.0A CN106649000B (en) 2017-01-03 2017-01-03 Fault recovery method of real-time processing engine and corresponding server

Publications (2)

Publication Number Publication Date
CN106649000A true CN106649000A (en) 2017-05-10
CN106649000B CN106649000B (en) 2020-02-18

Family

ID=58838284

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710002127.0A Active CN106649000B (en) 2017-01-03 2017-01-03 Fault recovery method of real-time processing engine and corresponding server

Country Status (1)

Country Link
CN (1) CN106649000B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108959063A (en) * 2017-05-25 2018-12-07 北京京东尚科信息技术有限公司 A kind of method and apparatus that program executes
CN109344030A (en) * 2018-09-21 2019-02-15 四川长虹电器股份有限公司 The method of streaming fault data write-in processing
CN110445639A (en) * 2019-07-05 2019-11-12 视联动力信息技术股份有限公司 A kind of hot spare method and device of server
CN111880909A (en) * 2020-07-27 2020-11-03 平安科技(深圳)有限公司 Distributed data publishing method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102546135A (en) * 2010-12-31 2012-07-04 富泰华工业(深圳)有限公司 System and method for switching between active and standby servers
US8296599B1 (en) * 2009-06-30 2012-10-23 Symantec Corporation System and method for implementing clustered network file system lock management
CN102868560A (en) * 2012-09-28 2013-01-09 南京恩瑞特实业有限公司 System and method for realizing hot standby of servers
CN103530200A (en) * 2012-07-04 2014-01-22 腾讯科技(深圳)有限公司 Server hot backup system and method
CN103618788A (en) * 2013-11-26 2014-03-05 曙光信息产业股份有限公司 System high-availability method supporting B/S structure
CN105573867A (en) * 2015-12-30 2016-05-11 浪潮(北京)电子信息产业有限公司 Method and system for realizing high availability of MySQL

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8296599B1 (en) * 2009-06-30 2012-10-23 Symantec Corporation System and method for implementing clustered network file system lock management
CN102546135A (en) * 2010-12-31 2012-07-04 富泰华工业(深圳)有限公司 System and method for switching between active and standby servers
CN103530200A (en) * 2012-07-04 2014-01-22 腾讯科技(深圳)有限公司 Server hot backup system and method
CN102868560A (en) * 2012-09-28 2013-01-09 南京恩瑞特实业有限公司 System and method for realizing hot standby of servers
CN103618788A (en) * 2013-11-26 2014-03-05 曙光信息产业股份有限公司 System high-availability method supporting B/S structure
CN105573867A (en) * 2015-12-30 2016-05-11 浪潮(北京)电子信息产业有限公司 Method and system for realizing high availability of MySQL

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108959063A (en) * 2017-05-25 2018-12-07 北京京东尚科信息技术有限公司 A kind of method and apparatus that program executes
CN109344030A (en) * 2018-09-21 2019-02-15 四川长虹电器股份有限公司 The method of streaming fault data write-in processing
CN110445639A (en) * 2019-07-05 2019-11-12 视联动力信息技术股份有限公司 A kind of hot spare method and device of server
CN110445639B (en) * 2019-07-05 2022-07-12 视联动力信息技术股份有限公司 Hot standby method and device for server
CN111880909A (en) * 2020-07-27 2020-11-03 平安科技(深圳)有限公司 Distributed data publishing method and device
CN111880909B (en) * 2020-07-27 2024-07-19 平安科技(深圳)有限公司 Distributed data publishing method and device

Also Published As

Publication number Publication date
CN106649000B (en) 2020-02-18

Similar Documents

Publication Publication Date Title
US7698602B2 (en) Systems, methods and computer products for trace capability per work unit
CN110309218B (en) Data exchange system and data writing method
WO2019182671A1 (en) System and method for process state processing
CN106649000A (en) Fault recovery method for real-time processing engine, and corresponding server
CN103761165B (en) Log backup method and device
CN111752799A (en) Service link tracking method, device, equipment and storage medium
US9037905B2 (en) Data processing failure recovery method, system and program
CN110928851B (en) Method, device and equipment for processing log information and storage medium
CN112527879B (en) Kafka-based real-time data extraction method and related equipment
CN107040576B (en) Information pushing method and device and communication system
WO2018068639A1 (en) Data recovery method and apparatus, and storage medium
CN113656149B (en) Application processing method and device and related equipment
CN114860654A (en) Method and system for dynamically changing Iceberg table Schema based on Flink data stream
CN111339118A (en) Kubernetes-based resource change history recording method and device
CN106874343B (en) Data deletion method and system for time sequence database
CN111314158A (en) Big data platform monitoring method, device, equipment and medium
CN108733808B (en) Big data software system switching method, system, terminal equipment and storage medium
CN110858168B (en) Cluster node fault processing method and device and cluster node
CN109656825B (en) Method and device for processing art resources, electronic equipment and storage medium
CN111226200B (en) Method, device and distributed system for creating consistent snapshot for distributed application
CN116010244A (en) Automatic test method, device, electronic equipment and storage medium
CN115827780A (en) Method, system and storage medium for realizing cross-network-area data synchronization by using scheduling algorithm based on isolation device
CN115794783A (en) Data deduplication method, device, equipment and medium
CN112685370B (en) Log collection method, device, equipment and medium
CN108958968B (en) File processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 200233 11-12 / F, building B, 88 Hongcao Road, Xuhui District, Shanghai

Patentee after: Star link information technology (Shanghai) Co.,Ltd.

Address before: 200233 11-12 / F, building B, 88 Hongcao Road, Xuhui District, Shanghai

Patentee before: TRANSWARP TECHNOLOGY (SHANGHAI) Co.,Ltd.

PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Fault recovery methods and corresponding servers for real-time processing engines

Effective date of registration: 20230616

Granted publication date: 20200218

Pledgee: Bank of China Limited by Share Ltd. Shanghai Xuhui branch

Pledgor: Star link information technology (Shanghai) Co.,Ltd.

Registration number: Y2023310000252

PC01 Cancellation of the registration of the contract for pledge of patent right

Granted publication date: 20200218

Pledgee: Bank of China Limited by Share Ltd. Shanghai Xuhui branch

Pledgor: Star link information technology (Shanghai) Co.,Ltd.

Registration number: Y2023310000252