CN111177001A

CN111177001A - Automatic testing method, system and equipment for high availability of Flink component

Info

Publication number: CN111177001A
Application number: CN201911364067.2A
Authority: CN
Inventors: 周俊青
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2019-12-26
Filing date: 2019-12-26
Publication date: 2020-05-19

Abstract

The invention provides an automatic testing method, system and device for high availability of a FLINK component, which realize automatic testing of FLINK main and standby JobManager switching.

Description

Automatic testing method, system and equipment for high availability of Flink component

Technical Field

The invention relates to the technical field of big data, in particular to a method, a system and equipment for automatically testing high availability of a Flink component.

Background

Flink is a distributed system for stateful parallel data stream processing. That is, Flink runs distributively on multiple machines. There are four different components in Flink that cooperate together to run a stream program. These components are: one JobManager, one ResourceManager, one TaskManager, and one Dispatcher. Flink is implemented by Java and Scala, so these components all run in the JVM.

Wherein, JobManager is a master (master) process for managing the execution of a single application. Each application is managed by a different JobManager. JobManager receives the application and executes it. An application comprises: a JobGraph, a logical dataflow graph (logical dataflow graph), and a Jar file (containing all required classes, lib libraries, and other resources). JobManager converts the JobGraph into a physical dataflow graph (physical dataflow graph), called an executive Graph. The ExecutionGraph consists of several tasks (tasks) that can be executed in parallel. JobManager applies the necessary computing resources (called TaskManager slots) to ResourceManager for performing tasks. Once JobManager receives enough TaskManagerslots, it distributes the task in the ExecutionGraph to the TaskManager and then executes. In execution, the JobManager is responsible for any operations that require central coordination (central coordination), such as coordination of checkpoints

Currently, the High Availability HA (High Availability) function of the FLINK component is an important function of a big data cluster, bears distributed columnar storage management and scheduling of a big data platform, and is a guarantee of High stability and High reliability of a data table of a cluster database, so the HA function test of the FLINK component is a very important link. When the current FLINK component is used for high-availability function test, the log result is not recorded in the execution process, only manual test can be adopted, the operation is complex, time and labor are consumed, the risk of data stability and reliability exists even when the FLINK component is executed, and the high availability cannot be absolutely guaranteed without any risk.

Disclosure of Invention

In view of the above problems, an object of the present invention is to provide a method, a system, and a device for automatically testing high availability of a Flink component, which can implement an automatic test for switching between main and standby Flink jobmanagers.

In order to achieve the purpose, the invention is realized by the following technical scheme: a method for automatically testing high availability of a Flink component comprises the following steps:

s1: checking the current state of the cluster JobManager;

s2: forcibly switching the main and standby JobManagers;

s3: after the forced switching occurs, carrying out cluster function verification;

s4: simulating a JobManager node fault of the FLINK;

s5: performing cluster function verification again;

s6: and generating a test log.

Further, the step S1 specifically includes:

and under the Flink user, acquiring the node of the JobManager in the Active state through a Flink command, and recording.

Further, the step S1 includes:

calling a self script/usr/hdp/3.0.1.0-187/Flink/bin/get-Active-master. rb of the Flink to acquire a node where a JobManager with an Active state is located;

the number of the JobManager components is two, and the JobManager components comprise: JobManager1 and JobManager 2;

the state of the JobManager1 is obtained as Active, and the state of the JobManager2 is obtained as Backup.

Further, the step S2 includes:

acquiring the master and standby states of the JobManager1 and the JobManager2 through the current JobManager state check of the cluster;

calling a transitionToActive () method to execute a command of Flink to forcibly switch the JobManager, switching the JobManager2 into Active and checking whether the switching of the JobManager2 node is successful;

calling the transitionToActive () method to execute the command of Flink to force the change of JobManager switches the JobManager1 to Backup and checks whether the JobManager1 node change was successful.

Further, the steps S3 and S5 each include:

job Job running MR performs cluster function normal verification by executing the commands of the TestDFSIO tool.

Further, the step S4 includes:

acquiring the process number of the JobManager1 node, and killing the process through a kill-9 command; it is checked whether the status of JobManager2 of Flink is automatically switched.

Further, the step S6 includes:

and recording an automatic test execution process to a log file log by executing a script/automatic test _ FlinkHA.sh > log.

Correspondingly, the invention also discloses an automatic testing system for high availability of the Flink component, which comprises: a state acquisition unit, configured to check a current state of the cluster JobManager;

the master-slave switching unit is used for forcibly switching the master-slave JobManager;

the verification unit is used for verifying the cluster function;

the fault simulation unit is used for simulating the fault of the JobManager node of the FLINK;

and the recording unit is used for generating a test log.

Correspondingly, the invention also discloses high-availability automatic test equipment for the Flink component, which comprises the following components: a memory for storing a computer program;

a processor for implementing the steps of the automatic testing method for high availability of the Flink assembly as described in any of the above when the computer program is executed.

Compared with the prior art, the invention has the beneficial effects that: the invention provides an automatic testing method, system and equipment for high availability of a FLINK component, which realize automatic testing of FLINK main and standby JobManager switching.

According to the invention, the Shell script can be compiled into the automatic script as the development language, and through the application of the automatic script, the high availability of the FLINK component of the big data platform is verified, the execution log is reserved, the test efficiency is improved, the human resources are saved, and the guarantee and the basis are provided for the high availability test of the clustered FLINK component.

Therefore, compared with the prior art, the invention has prominent substantive features and remarkable progress, and the beneficial effects of the implementation are also obvious.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2 is a system block diagram of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made with reference to the accompanying drawings.

The first embodiment is as follows:

the automatic testing method for the high availability of the Flink component shown in FIG. 1 comprises the following steps:

s1: the current state of the cluster JobManager is checked. And under the Flink user, acquiring the node of the JobManager in the Active state through a Flink command, and recording. The method specifically comprises the following steps: calling a self script/usr/hdp/3.0.1.0-187/Flink/bin/get-Active-master. rb of the Flink to acquire a node where a JobManager with an Active state is located; the number of the JobManager components is two, and the JobManager components comprise: JobManager1 and JobManager 2; the state of the JobManager1 is obtained as Active, and the state of the JobManager2 is obtained as Backup.

S2: and forcibly switching the main and standby JobManagers. The method comprises the following steps:

acquiring the master and standby states of the JobManager1 and the JobManager2 through the current JobManager state check of the cluster; calling a transitionToActive () method to execute a command of Flink to forcibly switch the JobManager, switching the JobManager2 into Active and checking whether the switching of the JobManager2 node is successful; calling the transitionToActive () method executes Flink to force the JobManager1 command to switch to Back and checks if the JobManager1 node switch was successful.

S3: and after the forced switching occurs, carrying out cluster function verification. Job Job running MR performs cluster function normal verification by executing the commands of the TestDFSIO tool.

S4: simulate the fault of the joint manager node of the FLINK. Acquiring the process number of the JobManager1 node, and killing the process through a kill-9 command; it is checked whether the status of JobManager2 of Flink is automatically switched.

S5: and performing cluster function verification again. After the automatic switching of the states of the JobManagers occurs, the cluster function normal verification is carried out by running the Job operation of the MR through executing the command of the TestDFSIO tool.

S6: and generating a test log. And recording an automatic test execution process to a log file log by executing a script/automatic test _ FlinkHA.sh > log.

Example two:

the embodiment provides an automatic testing method for high availability of a Flink component, which comprises the following steps:

1. check the current state of the cluster JobManager:

under the Flink user, a node where a JobManager in an Active state is located is obtained through a Flink command and recorded (the states of the JobManager1 and the JobManager2 are Active and Backup respectively).

Invoking Flink self script

The node where the JobManager with the Active state is located is acquired by/usr/hdp/3.0.1.0-187/Flink/bin/get-Active-master. rb:

activenode＝$(Flink org.jruby.Main

/usr/hdp/3.0.1.0-187/Flink/bin/get-active-master.rb)

# thus the JobManager1, JobManager2 status

JobManager1_Status＝$(ACTIVE)(if activenode＝managernode)

JobManager2_Status＝$(BACKUP)(if activenode＝managernode)

JobManager1_Status＝$(BACKUP)(if activenode＝masternode)

JobManager2_Status＝$(ACTIVE)(if activenode＝masternode)

2. And (3) forced switching of the main and standby JobManagers:

2.1, according to the main and standby states of the JobManager1 and the JobManager2, which are obtained when the current JobManager state of the cluster is checked.

2.2, calling the transitionToActive () method to execute the Flink command of forcibly switching the JobManager switches the JobManager2 to Active and checks whether the JobManager2 node switching is successful.

2.3, calling the transitionToActive () method to execute the Flink command of forcibly switching the JobManager switches the JobManager1 to Active and checks whether the JobManager1 node switching is successful.

3. And (3) cluster function verification:

after the JobManager state is forcibly switched, executing a command of a TestDFSIO tool to run the Job Job of the MR to carry out cluster function normal verification.

# find test JAR packet location, return to Path

Path＝$(find/-namehadoop-mapreduce-client-jobclient-2.7.3.2.6.4.0-91-tests.jar)

# running benchmark test

hadoop jar$Path TestDFSIO-D mapreduce.job.queuename＝"default"-write-nrFiles 10-size 128KB

Verification component functional normality verification by running Flink benchmark test command

Flink org.apache.hadoop.Flink.PerformanceEvaluation

Flink org.apache.hadoop.Flink.PerformanceEvaluation--nomapred--rows＝100000--presplit＝100sequentialWrite 100

Flink org.apache.hadoop.Flink.PerformanceEvaluation--nomapred--rows＝1000--presplit＝100sequentialWrite 10

4. Simulating a job manager node fault of FLINK:

the process number of the JobManager1 node is obtained, and the process is killed through a kill-9 command. It is checked whether the status of JobManager2 of Flink is automatically switched.

5. And (3) cluster function verification:

after the automatic switching of the states of the JobManagers occurs, the cluster function normal verification is carried out by running the Job operation of the MR through executing the command of the TestDFSIO tool.

# running benchmark test

Flink org.apache.hadoop.Flink.PerformanceEvaluation

6. Generating a log:

log records the whole automatic test execution process to a log file.

# execution script

Log

Correspondingly, as shown in fig. 2, the invention also discloses an automatic testing system for high availability of a Flink assembly, comprising:

a state acquisition unit, configured to check a current state of the cluster JobManager;

the verification unit is used for verifying the cluster function;

and the recording unit is used for generating a test log.

Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied in the form of a software product, where the computer software product is stored in a storage medium, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like, and the storage medium can store program codes, and includes instructions for enabling a computer terminal (which may be a personal computer, a server, or a second terminal, a network terminal, and the like) to perform all or part of the steps of the method in the embodiments of the present invention. The same and similar parts in the various embodiments in this specification may be referred to each other. Especially, for the terminal embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant points can be referred to the description in the method embodiment.

In the embodiments provided by the present invention, it should be understood that the disclosed system, system and method can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit.

Similarly, each processing unit in the embodiments of the present invention may be integrated into one functional module, or each processing unit may exist physically, or two or more processing units are integrated into one functional module.

The invention is further described with reference to the accompanying drawings and specific embodiments. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and these equivalents also fall within the scope of the present application.

Claims

1. A method for automatically testing high availability of a Flink component is characterized by comprising the following steps:

s1: checking the current state of the cluster JobManager;

s2: forcibly switching the main and standby JobManagers;

s4: simulating a JobManager node fault of the FLINK;

s5: performing cluster function verification again;

s6: and generating a test log.

2. The method for automatically testing high availability of the Flink assembly according to claim 1, wherein the step S1 is specifically as follows:

3. The method for automatically testing high availability of a Flink assembly according to claim 2, wherein the step S1 comprises:

4. The method for automatically testing high availability of a Flink assembly according to claim 3, wherein the step S2 comprises:

5. The method for automatically testing high availability of a Flink assembly according to claim 1, wherein the steps S3 and S5 each comprise:

6. The method for automatically testing high availability of a Flink assembly according to claim 3, wherein the step S4 comprises:

acquiring the process number of the JobManager1 node, and killing the process through a kill-9 command;

it is checked whether the status of JobManager2 of Flink is automatically switched.

7. The method for automatically testing high availability of a Flink assembly according to claim 1, wherein the step S6 comprises:

8. An automatic test system for high availability of Flink components, comprising:

the verification unit is used for verifying the cluster function;

and the recording unit is used for generating a test log.

9. An automatic test equipment for high availability of Flink components, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the method for automatic testing of the high availability of the Flink assembly according to any of the claims 1 to 7 when executing said computer program.