CN111124630A - System and method for running Spark Streaming program - Google Patents

System and method for running Spark Streaming program Download PDF

Info

Publication number
CN111124630A
CN111124630A CN201911197734.2A CN201911197734A CN111124630A CN 111124630 A CN111124630 A CN 111124630A CN 201911197734 A CN201911197734 A CN 201911197734A CN 111124630 A CN111124630 A CN 111124630A
Authority
CN
China
Prior art keywords
node
queue
fault
spark streaming
streaming program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911197734.2A
Other languages
Chinese (zh)
Other versions
CN111124630B (en
Inventor
周朝卫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongying Youchuang Information Technology Co Ltd
Original Assignee
Zhongying Youchuang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongying Youchuang Information Technology Co Ltd filed Critical Zhongying Youchuang Information Technology Co Ltd
Priority to CN201911197734.2A priority Critical patent/CN111124630B/en
Publication of CN111124630A publication Critical patent/CN111124630A/en
Application granted granted Critical
Publication of CN111124630B publication Critical patent/CN111124630B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0715Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a system implementing multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a system and a method for operating a Spark Streaming program, wherein the system comprises: the nodes are positioned in the candidate node queue, and each node is provided with a Spark Streaming program; determining a coordinating node among the plurality of nodes; selecting a first node from the candidate node queue to run a Spark Streaming program; receiving state information sent by each node in the candidate node queue, and determining a fault node; deleting the fault node from the candidate node queue to obtain a target candidate node queue; when the node running the Spark Streaming program fails, a second node is selected from the target candidate node queue to run the Spark Streaming program.

Description

System and method for running Spark Streaming program
Technical Field
The invention relates to the field of computers, in particular to a system and a method for operating a Spark Streaming program.
Background
Spark is a big data parallel computing framework based on memory computing, and can greatly improve the real-time performance of data processing in a big data environment. The Spark Streaming is an extension of a Spark core API, and can realize the processing of real-time Streaming data with high throughput and a fault-tolerant mechanism.
The spare Streaming needs to continuously process the real-time data stream, so that a host running the spare Streaming program needs to guarantee continuous and stable running as a resident process, and the spare Streaming program exits when the host goes down, the memory overflows due to GC (garbage collection), the memory is insufficient due to the transient peak of the data source, the Driver process of the spare Streaming program is abnormal, the transient fault of the upstream data source and the like occur, and the stable running of the spare Streaming program is difficult to guarantee.
Disclosure of Invention
An embodiment of the present invention provides an operating system of a Spark Streaming program, which is used to ensure stable operation of the Spark Streaming program, and the system includes:
the nodes are positioned in the candidate node queue, and each node is provided with a Spark Streaming program; determining a coordinating node among the plurality of nodes;
wherein the coordinating node is configured to: selecting a first node from the candidate node queue to run a Spark Streaming program, and deleting the first node from the candidate node queue;
receiving state information sent by each node in the candidate node queue, and determining a fault node according to the state information; deleting the fault node from the candidate node queue to obtain a target candidate node queue;
when the node running the Spark Streaming program fails, selecting a second node from the target candidate node queue to run the Spark Streaming program, and deleting the second node from the target candidate node queue.
The embodiment of the present invention further provides an operation method of a Spark Streaming program, which is used to ensure stable operation of the Spark Streaming program, and the method includes:
respectively installing Spark Streaming programs on a plurality of nodes, wherein the nodes are in a candidate node queue; determining a coordinating node among the plurality of nodes;
selecting a first node from the candidate node queue to run a Spark Streaming program, and deleting the first node from the candidate node queue;
receiving state information sent by each node in the candidate node queue, and determining a fault node according to the state information; deleting the fault node from the candidate node queue to obtain a target candidate node queue;
when the node running the Spark Streaming program fails, selecting a second node from the target candidate node queue to run the Spark Streaming program, and deleting the second node from the target candidate node queue.
The embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the processor implements the running method of the above Spark stream mining program.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program for executing the operation method of the Spark Streaming program is stored in the computer-readable storage medium.
The embodiment of the invention comprises the following steps: respectively installing Spark Streaming programs on a plurality of nodes, wherein the nodes are in a candidate node queue; determining a coordinating node among the plurality of nodes; selecting a first node from the candidate node queue to run a Spark Streaming program, and deleting the first node from the candidate node queue; receiving state information sent by each node in the candidate node queue, and determining a fault node according to the state information; deleting the fault node from the candidate node queue to obtain a target candidate node queue, and monitoring the state of each node in the candidate node queue to ensure that the nodes in the target candidate node queue are normal nodes; when the node running the Spark Streaming program fails, selecting a second node from the target candidate node queue to run the Spark Streaming program, deleting the second node from the target candidate node queue, and selecting a new node to run the Spark Streaming program in time when the node running the Spark Streaming program fails, so that stable running of the Spark Streaming program is realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:
fig. 1 is a schematic diagram of an operating system structure of a Spark Streaming program according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a flow of an operation method of a Spark Streaming program in the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
In order to ensure stable operation of a Spark Streaming program, an embodiment of the present invention provides an operating system of the Spark Streaming program, and fig. 1 is a schematic diagram of an operating system structure of the Spark Streaming program in the embodiment of the present invention, as shown in fig. 1, the system includes:
a plurality of nodes 01, wherein the plurality of nodes 01 are positioned in a candidate node queue 03, and each node is provided with a SparkStreaming program; determining a coordinating node 02 among the plurality of nodes 01;
wherein, the coordinating node 02 is configured to: selecting a first node from the candidate node queue 03 to run a Spark Streaming program, and deleting the first node from the candidate node queue;
receiving state information sent by each node in the candidate node queue 03, and determining a fault node according to the state information; deleting the fault node from the candidate node queue 03 to obtain a target candidate node queue 04;
when the node running the Spark Streaming program fails, selecting a second node from the target candidate node queue 04 to run the Spark Streaming program, and deleting the second node from the target candidate node queue 04.
As shown in fig. 1, an embodiment of the present invention is implemented by: respectively installing Spark Streaming programs on a plurality of nodes, wherein the nodes are in a candidate node queue; determining a coordinating node among the plurality of nodes; selecting a first node from the candidate node queue to run a Spark Streaming program, and deleting the first node from the candidate node queue; receiving state information sent by each node in the candidate node queue, and determining a fault node according to the state information; deleting the fault node from the candidate node queue to obtain a target candidate node queue, and monitoring the state of each node in the candidate node queue to ensure that the nodes in the target candidate node queue are normal nodes; when the node running the Spark Streaming program fails, selecting a second node from the target candidate node queue to run the Spark Streaming program, deleting the second node from the target candidate node queue, and selecting a new node to run the Spark Streaming program in time when the node running the Spark Streaming program fails, so that stable running of the Spark Streaming program is realized.
In a specific implementation, the plurality of nodes 01 may be a plurality of hosts, a spare Streaming program with a high availability function is deployed on each host, the plurality of nodes 01 are in the candidate node queue 03, and a coordination node 02 is generated among the plurality of nodes 01.
In one embodiment, upon failure of the coordinating node, the other nodes than the coordinating node are further configured to:
obtaining a fault coordination node version value in a database table; the database table comprises a global table, and the global table is used for storing the version value of the running coordination node;
writing a new coordination node version value into a database table according to the fault coordination node version value; the database table comprises a coordination node table, and the coordination node table is used for storing a new coordination node version value;
and taking the node which is successfully written as a new coordination node, and updating a coordination node version value in the global table.
In specific implementation, as shown in table 1, a coordination node table may be added in a database table to store information such as a version value, a host name, and a host IP of a coordination node, and a new coordination node version value may be stored when a new coordination node is generated, and when the new coordination node is used for the first time, the coordination node table is initialized, where the coordination node table has a record and the version value is initialized to 0.
For example:
insert into coordinator(node,version)values("host_spark_001",0);
the above statement initializes a record, hostname: host _ spark _001, version: 0.
TABLE 1 coordination node table
Field value app_kafka_id
Coordinating nodes Identity of coordinating nodes, e.g. hostname, host IP, etc
Version value 0 (adding 1 each time a new coordinating node is generated)
As shown in Table 2, a global table may be added to the database table to hold the running coordination node version values.
TABLE 2 Global tables
Field value app_kafka_id
Version value 0 (new coordinating node is generated each time)Shijia 1)
When the method is used for the first time, the global table is initialized, and the consistency of the submission marks of the coordination node table and the global table is ensured.
For example:
insert into global_version(id,flag)values("app_kafka_id",0);
each node in the candidate node queue 03 may periodically communicate with the coordinating node 02, and when a communication failure occurs between the coordinating node 02 and each node, it indicates that the coordinating node fails, and at this time, other nodes except the coordinating node may determine a new coordinating node according to the following steps:
firstly, obtaining a version value of a fault coordination node in a global table;
then, according to the version value of the fault coordination node in the global table, writing a new version value of the coordination node into the coordination node table, setting the value of the node field as the host name of the current node, and increasing the version value by 1;
example (c):
the version value of the fault coordination node obtained from the global table is 5;
update coordinator set node=‘host_spark_002’,version=version+1whereversion=5。
when one node successfully updates the version value, the version value is changed to 6, and the other nodes do not update the data in the table under the condition that the version is 5. Whether the execution is successful or not can be judged according to the return value of the execution result. When the return value is 1, the updating is successful, and the node is taken as a coordinating node.
And then, updating the version values of the coordination nodes of the global table seeds, and ensuring that the version values of the global table and the coordination node table are consistent.
In a specific implementation, the coordinating node 02 may select the node ranked first in the candidate node queue 03 to run a spare Streaming program, and delete the node from the candidate node queue 03.
In one embodiment, the coordinating node 02 is specifically configured to:
and comparing the state information of each node with preset state information, and determining a fault node according to a comparison result.
In one embodiment, the coordinating node 02 is specifically configured to:
and when the node which does not receive the state information within the preset time length exists, determining the node as a fault node.
In one embodiment, protocol adjustment 02 is specifically used to:
receiving state information sent by a fault node, and comparing the state information of the fault node with preset state information;
when the troubleshooting of the failed node is determined according to the comparison result, the failed node is added to the target candidate node queue 04.
In specific implementation, the state information of the node may include a disk state, a network card state, and the like, and each node deploying the high-available Spark Streaming program writes the latest information into the database table in real time, which may include: node name, state information, and timestamp, as shown in table 3:
table 3 node database table
Node name Status of state Time stamp
spark_host_001 Is normal 2019-10-23 19:47:22
spark_host_002 Fault of 2019-10-23 19:56:47
In specific implementation, the coordinating node 02 may periodically traverse the database table, compare the state information of each node in the candidate node queue 03 with preset state information, determine a failed node according to the comparison result, and delete the failed node from the candidate node queue 03 to obtain the target candidate node queue 04. The coordinating node 02 may also set a preset time duration, and when the timestamp in table 3 exceeds the preset time duration but does not receive the state information of the node, the node is considered to be a failed node, a spare Streaming program may not be run, and the node is deleted from the candidate node queue to obtain the target candidate node queue 04. The coordinating node 02 may further receive state information sent by the failed node, compare the state information of the failed node with preset state information, and when the failure of the failed node is cleared, add the node with the cleared failure to the tail of the target candidate node queue 04 again.
In a specific implementation, the coordinating node 02 may use the API of YARN to periodically poll the status of the spare Streaming program, and when the spare Streaming program is abnormal, the coordinating node 02 may further select the node ranked first in the target candidate node queue 04 to run the spare Streaming program, and delete the node from the target candidate node queue 04.
In one embodiment, before the second node runs the Spark Streaming program, the second node is configured to: the Spark Streaming program run by the first node is killed.
In specific implementation, in order to avoid split brain, before the new node runs the Spark Streaming program, the new node can kill the Spark Streaming program run by the original node through ssh process by using kill-9 command.
In the following a specific example is given to facilitate an understanding of how the invention may be carried into effect.
The first step is as follows: deploying a Spark Streaming program with a high available function on a plurality of nodes 01, taking the plurality of nodes as a candidate node queue 03, writing version values into a coordination node table by the plurality of nodes 01, and taking a node with a first successful writing as a coordination node 02;
the second step is that: the coordination node 02 selects the first node arranged in the candidate node queue 03 to run a SparkStreaming program, and deletes the node from the candidate node queue 03;
the third step: the coordination node 02 regularly traverses the state information of each node in the candidate node queue 03 in the database table, compares the state information of each node in the candidate node queue 03 with preset state information, determines a fault node according to a comparison result, and deletes the fault node from the candidate node queue 03 to obtain a target candidate node queue 04.
The fourth step: the coordinating node 02 periodically polls the state of the Spark Streaming program using the API of YARN, and when the Spark Streaming program is abnormal, selects the node ranked first in the target candidate node queue 04 to run the Spark Streaming program, and deletes the node from the target candidate node queue 04.
The fifth step: before the new node runs the Spark Streaming program, the new node kills the Spark Streaming program run by the original node by ssh process using kill-9 command.
Based on the same inventive concept, the embodiment of the present invention further provides an operation method of a Spark Streaming program, as in the following embodiments. Because the principle of solving the problem of the operation method of the Spark Streaming program is similar to that of the operation device of the Spark Streaming program, the implementation of the method can be referred to the implementation of the device, and repeated parts are not described again.
An embodiment of the present invention provides a method for running a Spark Streaming program, and fig. 2 is a schematic diagram of a flow of the method for running the Spark Streaming program in the embodiment of the present invention, as shown in fig. 2, the method includes:
step 101: respectively installing Spark Streaming programs on a plurality of nodes, wherein the nodes are in a candidate node queue; determining a coordinating node among the plurality of nodes;
step 102: selecting a first node from the candidate node queue to run a Spark Streaming program, and deleting the first node from the candidate node queue;
step 103: receiving state information sent by each node in the candidate node queue, and determining a fault node according to the state information; deleting the fault node from the candidate node queue to obtain a target candidate node queue;
step 104: when the node running the Spark Streaming program fails, selecting a second node from the target candidate node queue to run the Spark Streaming program, and deleting the second node from the target candidate node queue.
In one embodiment, step 103 may comprise:
and comparing the state information of each node with preset state information, and determining a fault node according to a comparison result.
In one embodiment, step 103 may comprise:
and when the node which does not receive the state information within the preset time length exists, determining the node as a fault node.
In one embodiment, step 103 may comprise:
receiving state information sent by a fault node, and comparing the state information of the fault node with preset state information;
and when the fault elimination of the fault node is determined according to the comparison result, adding the fault eliminated node into the target candidate node queue.
In one embodiment, when the coordinating node fails, the method may further include:
the other nodes than the coordinating node perform the following steps:
obtaining a fault coordination node version value in a database table; the database table comprises a global table, and the global table is used for storing the version value of the running coordination node;
writing a new coordination node version value into a database table according to the fault coordination node version value; the database table comprises a coordination node table, and the coordination node table is used for storing a new coordination node version value;
and taking the node which is successfully written as a new coordination node, and updating a coordination node version value in the global table.
In one embodiment, before the second node runs the Spark Streaming program in step 104, the method further comprises:
the Spark Streaming program run by the first node is killed.
The embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the processor implements the running method of the Spark Streaming program.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program for executing the operation method of the Spark Streaming program is stored in the computer-readable storage medium.
In summary, the embodiment of the present invention provides: respectively installing Spark Streaming programs on a plurality of nodes, wherein the nodes are in a candidate node queue; determining a coordinating node among the plurality of nodes; selecting a first node from the candidate node queue to run a Spark Streaming program, and deleting the first node from the candidate node queue; receiving state information sent by each node in the candidate node queue, and determining a fault node according to the state information; deleting the fault node from the candidate node queue to obtain a target candidate node queue, and monitoring the state of each node in the candidate node queue to ensure that the nodes in the target candidate node queue are normal nodes; when the node running the Spark Streaming program fails, selecting a second node from the target candidate node queue to run the Spark Streaming program, deleting the second node from the target candidate node queue, and selecting a new node to run the Spark Streaming program in time when the node running the Spark Streaming program fails, so that stable running of the Spark Streaming program is realized.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and variations of the embodiment of the present invention may occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A system for running a Spark Streaming program, comprising: the nodes are positioned in a candidate node queue, and each node is provided with a Spark Streaming program; determining a coordinating node among the plurality of nodes;
wherein the coordinating node is configured to: selecting a first node from the candidate node queue to run a Spark Streaming program, and deleting the first node from the candidate node queue;
receiving state information sent by each node in the candidate node queue, and determining a fault node according to the state information; deleting the fault node from the candidate node queue to obtain a target candidate node queue;
when the node running the Spark Streaming program fails, selecting a second node from the target candidate node queue to run the Spark Streaming program, and deleting the second node from the target candidate node queue.
2. The system of claim 1, wherein determining a failed node based on the status information comprises:
and comparing the state information of each node with preset state information, and determining a fault node according to a comparison result.
3. The system of claim 1, wherein determining a failed node based on the status information comprises:
and when the node which does not receive the state information within the preset time length exists, determining the node as a fault node.
4. The system of claim 1, further comprising:
receiving state information sent by the fault node, and comparing the state information of the fault node with preset state information;
and when the fault elimination of the fault node is determined according to the comparison result, adding the fault eliminated node into the target candidate node queue.
5. The system of claim 1, wherein upon failure of the coordinating node, the other nodes than the coordinating node are further configured to:
obtaining a fault coordination node version value in a database table; the database table comprises a global table, and the global table is used for storing a running coordination node version value;
writing a new coordination node version value into a database table according to the fault coordination node version value; the database table comprises a coordination node table, and the coordination node table is used for storing a new coordination node version value;
and taking the node which is successfully written as a new coordination node, and updating the version value of the coordination node in the global table.
6. The system of claim 1, wherein before the second node runs a Spark Streaming program, the second node is configured to: killing a Spark Streaming program run by the first node.
7. A method for operating a Spark Streaming program, comprising:
respectively installing a Spark Streaming program on a plurality of nodes, wherein the nodes are in a candidate node queue; determining a coordinating node among the plurality of nodes;
selecting a first node from the candidate node queue to run a Spark Streaming program, and deleting the first node from the candidate node queue;
receiving state information sent by each node in the candidate node queue, and determining a fault node according to the state information; deleting the fault node from the candidate node queue to obtain a target candidate node queue;
when the node running the Spark Streaming program fails, selecting a second node from the target candidate node queue to run the Spark Streaming program, and deleting the second node from the target candidate node queue.
8. The method of claim 7, wherein determining a failed node based on the status information comprises:
and comparing the state information of each node with preset state information, and determining a fault node according to a comparison result.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 7 to 8 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 7 to 8.
CN201911197734.2A 2019-11-29 2019-11-29 System and method for operating Spark Streaming program Active CN111124630B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911197734.2A CN111124630B (en) 2019-11-29 2019-11-29 System and method for operating Spark Streaming program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911197734.2A CN111124630B (en) 2019-11-29 2019-11-29 System and method for operating Spark Streaming program

Publications (2)

Publication Number Publication Date
CN111124630A true CN111124630A (en) 2020-05-08
CN111124630B CN111124630B (en) 2024-03-12

Family

ID=70496946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911197734.2A Active CN111124630B (en) 2019-11-29 2019-11-29 System and method for operating Spark Streaming program

Country Status (1)

Country Link
CN (1) CN111124630B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2440277A1 (en) * 2001-03-07 2002-09-19 Oracle International Corporation Managing checkpoint queues in a multiple node system
CN106549796A (en) * 2016-09-27 2017-03-29 努比亚技术有限公司 Resource control method and host node that a kind of firmware space is downloaded
CN106778033A (en) * 2017-01-10 2017-05-31 南京邮电大学 A kind of Spark Streaming abnormal temperature data alarm methods based on Spark platforms
US20190244146A1 (en) * 2018-01-18 2019-08-08 D&B Business Information Solutions Elastic distribution queuing of mass data for the use in director driven company assessment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2440277A1 (en) * 2001-03-07 2002-09-19 Oracle International Corporation Managing checkpoint queues in a multiple node system
CN106549796A (en) * 2016-09-27 2017-03-29 努比亚技术有限公司 Resource control method and host node that a kind of firmware space is downloaded
CN106778033A (en) * 2017-01-10 2017-05-31 南京邮电大学 A kind of Spark Streaming abnormal temperature data alarm methods based on Spark platforms
US20190244146A1 (en) * 2018-01-18 2019-08-08 D&B Business Information Solutions Elastic distribution queuing of mass data for the use in director driven company assessment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
魏晓辉;刘智亮;庄园;李洪亮;李翔;: "支持大规模流数据在线处理的自适应检查点机制" *

Also Published As

Publication number Publication date
CN111124630B (en) 2024-03-12

Similar Documents

Publication Publication Date Title
US20200404032A1 (en) Streaming Application Upgrading Method, Master Node, and Stream Computing System
CN107844343B (en) Upgrading system and method for complex server application system
US9053166B2 (en) Dynamically varying the number of database replicas
US7925856B1 (en) Method and apparatus for maintaining an amount of reserve space using virtual placeholders
CN111143133B (en) Virtual machine backup method and backup virtual machine recovery method
CN105812177A (en) Network fault processing method and processing apparatus
US20190042323A1 (en) Global usage tracking and quota enforcement in a distributed computing system
US11223522B1 (en) Context-based intelligent re-initiation of microservices
CN114995771A (en) Formatting scheduling method, device, equipment and medium for redundant array of independent disks
US11399071B2 (en) Program operation system and program operation method
US10769153B2 (en) Computer system and method for setting a stream data processing system
US11449332B2 (en) Polling computing devices
US10379780B2 (en) Statistics management for scale-out storage
US10180914B2 (en) Dynamic domain name service caching
CN111124630A (en) System and method for running Spark Streaming program
US11855868B2 (en) Reducing the impact of network latency during a restore operation
US20220391411A1 (en) Dynamic adaptive partition splitting
US10031739B1 (en) System and method for updating a java management extensions collector
CN107422991B (en) Storage strategy management system
US11249859B2 (en) System and method for intelligent backup scheduling and management
US10949232B2 (en) Managing virtualized computing resources in a cloud computing environment
CN108241556A (en) The method and device of data remote backup in HDFS
US11481208B2 (en) Software patch difference devices
US20170109268A1 (en) Dynamic adjustment of instrumentation scope
CN113254271A (en) Data sequence recovery method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant