CN111177001A - Automatic testing method, system and equipment for high availability of Flink component - Google Patents

Automatic testing method, system and equipment for high availability of Flink component Download PDF

Info

Publication number
CN111177001A
CN111177001A CN201911364067.2A CN201911364067A CN111177001A CN 111177001 A CN111177001 A CN 111177001A CN 201911364067 A CN201911364067 A CN 201911364067A CN 111177001 A CN111177001 A CN 111177001A
Authority
CN
China
Prior art keywords
flink
jobmanager
high availability
node
switching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201911364067.2A
Other languages
Chinese (zh)
Inventor
周俊青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN201911364067.2A priority Critical patent/CN111177001A/en
Publication of CN111177001A publication Critical patent/CN111177001A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

The invention provides an automatic testing method, system and device for high availability of a FLINK component, which realize automatic testing of FLINK main and standby JobManager switching.

Description

Automatic testing method, system and equipment for high availability of Flink component
Technical Field
The invention relates to the technical field of big data, in particular to a method, a system and equipment for automatically testing high availability of a Flink component.
Background
Flink is a distributed system for stateful parallel data stream processing. That is, Flink runs distributively on multiple machines. There are four different components in Flink that cooperate together to run a stream program. These components are: one JobManager, one ResourceManager, one TaskManager, and one Dispatcher. Flink is implemented by Java and Scala, so these components all run in the JVM.
Wherein, JobManager is a master (master) process for managing the execution of a single application. Each application is managed by a different JobManager. JobManager receives the application and executes it. An application comprises: a JobGraph, a logical dataflow graph (logical dataflow graph), and a Jar file (containing all required classes, lib libraries, and other resources). JobManager converts the JobGraph into a physical dataflow graph (physical dataflow graph), called an executive Graph. The ExecutionGraph consists of several tasks (tasks) that can be executed in parallel. JobManager applies the necessary computing resources (called TaskManager slots) to ResourceManager for performing tasks. Once JobManager receives enough TaskManagerslots, it distributes the task in the ExecutionGraph to the TaskManager and then executes. In execution, the JobManager is responsible for any operations that require central coordination (central coordination), such as coordination of checkpoints
Currently, the High Availability HA (High Availability) function of the FLINK component is an important function of a big data cluster, bears distributed columnar storage management and scheduling of a big data platform, and is a guarantee of High stability and High reliability of a data table of a cluster database, so the HA function test of the FLINK component is a very important link. When the current FLINK component is used for high-availability function test, the log result is not recorded in the execution process, only manual test can be adopted, the operation is complex, time and labor are consumed, the risk of data stability and reliability exists even when the FLINK component is executed, and the high availability cannot be absolutely guaranteed without any risk.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide a method, a system, and a device for automatically testing high availability of a Flink component, which can implement an automatic test for switching between main and standby Flink jobmanagers.
In order to achieve the purpose, the invention is realized by the following technical scheme: a method for automatically testing high availability of a Flink component comprises the following steps:
s1: checking the current state of the cluster JobManager;
s2: forcibly switching the main and standby JobManagers;
s3: after the forced switching occurs, carrying out cluster function verification;
s4: simulating a JobManager node fault of the FLINK;
s5: performing cluster function verification again;
s6: and generating a test log.
Further, the step S1 specifically includes:
and under the Flink user, acquiring the node of the JobManager in the Active state through a Flink command, and recording.
Further, the step S1 includes:
calling a self script/usr/hdp/3.0.1.0-187/Flink/bin/get-Active-master. rb of the Flink to acquire a node where a JobManager with an Active state is located;
the number of the JobManager components is two, and the JobManager components comprise: JobManager1 and JobManager 2;
the state of the JobManager1 is obtained as Active, and the state of the JobManager2 is obtained as Backup.
Further, the step S2 includes:
acquiring the master and standby states of the JobManager1 and the JobManager2 through the current JobManager state check of the cluster;
calling a transitionToActive () method to execute a command of Flink to forcibly switch the JobManager, switching the JobManager2 into Active and checking whether the switching of the JobManager2 node is successful;
calling the transitionToActive () method to execute the command of Flink to force the change of JobManager switches the JobManager1 to Backup and checks whether the JobManager1 node change was successful.
Further, the steps S3 and S5 each include:
job Job running MR performs cluster function normal verification by executing the commands of the TestDFSIO tool.
Further, the step S4 includes:
acquiring the process number of the JobManager1 node, and killing the process through a kill-9 command; it is checked whether the status of JobManager2 of Flink is automatically switched.
Further, the step S6 includes:
and recording an automatic test execution process to a log file log by executing a script/automatic test _ FlinkHA.sh > log.
Correspondingly, the invention also discloses an automatic testing system for high availability of the Flink component, which comprises: a state acquisition unit, configured to check a current state of the cluster JobManager;
the master-slave switching unit is used for forcibly switching the master-slave JobManager;
the verification unit is used for verifying the cluster function;
the fault simulation unit is used for simulating the fault of the JobManager node of the FLINK;
and the recording unit is used for generating a test log.
Correspondingly, the invention also discloses high-availability automatic test equipment for the Flink component, which comprises the following components: a memory for storing a computer program;
a processor for implementing the steps of the automatic testing method for high availability of the Flink assembly as described in any of the above when the computer program is executed.
Compared with the prior art, the invention has the beneficial effects that: the invention provides an automatic testing method, system and equipment for high availability of a FLINK component, which realize automatic testing of FLINK main and standby JobManager switching.
According to the invention, the Shell script can be compiled into the automatic script as the development language, and through the application of the automatic script, the high availability of the FLINK component of the big data platform is verified, the execution log is reserved, the test efficiency is improved, the human resources are saved, and the guarantee and the basis are provided for the high availability test of the clustered FLINK component.
Therefore, compared with the prior art, the invention has prominent substantive features and remarkable progress, and the beneficial effects of the implementation are also obvious.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a system block diagram of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made with reference to the accompanying drawings.
The first embodiment is as follows:
the automatic testing method for the high availability of the Flink component shown in FIG. 1 comprises the following steps:
s1: the current state of the cluster JobManager is checked. And under the Flink user, acquiring the node of the JobManager in the Active state through a Flink command, and recording. The method specifically comprises the following steps: calling a self script/usr/hdp/3.0.1.0-187/Flink/bin/get-Active-master. rb of the Flink to acquire a node where a JobManager with an Active state is located; the number of the JobManager components is two, and the JobManager components comprise: JobManager1 and JobManager 2; the state of the JobManager1 is obtained as Active, and the state of the JobManager2 is obtained as Backup.
S2: and forcibly switching the main and standby JobManagers. The method comprises the following steps:
acquiring the master and standby states of the JobManager1 and the JobManager2 through the current JobManager state check of the cluster; calling a transitionToActive () method to execute a command of Flink to forcibly switch the JobManager, switching the JobManager2 into Active and checking whether the switching of the JobManager2 node is successful; calling the transitionToActive () method executes Flink to force the JobManager1 command to switch to Back and checks if the JobManager1 node switch was successful.
S3: and after the forced switching occurs, carrying out cluster function verification. Job Job running MR performs cluster function normal verification by executing the commands of the TestDFSIO tool.
S4: simulate the fault of the joint manager node of the FLINK. Acquiring the process number of the JobManager1 node, and killing the process through a kill-9 command; it is checked whether the status of JobManager2 of Flink is automatically switched.
S5: and performing cluster function verification again. After the automatic switching of the states of the JobManagers occurs, the cluster function normal verification is carried out by running the Job operation of the MR through executing the command of the TestDFSIO tool.
S6: and generating a test log. And recording an automatic test execution process to a log file log by executing a script/automatic test _ FlinkHA.sh > log.
Example two:
the embodiment provides an automatic testing method for high availability of a Flink component, which comprises the following steps:
1. check the current state of the cluster JobManager:
under the Flink user, a node where a JobManager in an Active state is located is obtained through a Flink command and recorded (the states of the JobManager1 and the JobManager2 are Active and Backup respectively).
Invoking Flink self script
The node where the JobManager with the Active state is located is acquired by/usr/hdp/3.0.1.0-187/Flink/bin/get-Active-master. rb:
activenode=$(Flink org.jruby.Main
/usr/hdp/3.0.1.0-187/Flink/bin/get-active-master.rb)
# thus the JobManager1, JobManager2 status
JobManager1_Status=$(ACTIVE)(if activenode=managernode)
JobManager2_Status=$(BACKUP)(if activenode=managernode)
JobManager1_Status=$(BACKUP)(if activenode=masternode)
JobManager2_Status=$(ACTIVE)(if activenode=masternode)
2. And (3) forced switching of the main and standby JobManagers:
2.1, according to the main and standby states of the JobManager1 and the JobManager2, which are obtained when the current JobManager state of the cluster is checked.
2.2, calling the transitionToActive () method to execute the Flink command of forcibly switching the JobManager switches the JobManager2 to Active and checks whether the JobManager2 node switching is successful.
2.3, calling the transitionToActive () method to execute the Flink command of forcibly switching the JobManager switches the JobManager1 to Active and checks whether the JobManager1 node switching is successful.
3. And (3) cluster function verification:
after the JobManager state is forcibly switched, executing a command of a TestDFSIO tool to run the Job Job of the MR to carry out cluster function normal verification.
# find test JAR packet location, return to Path
Path=$(find/-namehadoop-mapreduce-client-jobclient-2.7.3.2.6.4.0-91-tests.jar)
# running benchmark test
hadoop jar$Path TestDFSIO-D mapreduce.job.queuename="default"-write-nrFiles 10-size 128KB
Verification component functional normality verification by running Flink benchmark test command
Flink org.apache.hadoop.Flink.PerformanceEvaluation
Flink org.apache.hadoop.Flink.PerformanceEvaluation--nomapred--rows=100000--presplit=100sequentialWrite 100
Flink org.apache.hadoop.Flink.PerformanceEvaluation--nomapred--rows=1000--presplit=100sequentialWrite 10
4. Simulating a job manager node fault of FLINK:
the process number of the JobManager1 node is obtained, and the process is killed through a kill-9 command. It is checked whether the status of JobManager2 of Flink is automatically switched.
5. And (3) cluster function verification:
after the automatic switching of the states of the JobManagers occurs, the cluster function normal verification is carried out by running the Job operation of the MR through executing the command of the TestDFSIO tool.
# running benchmark test
hadoop jar$Path TestDFSIO-D mapreduce.job.queuename="default"-write-nrFiles 10-size 128KB
Verification component functional normality verification by running Flink benchmark test command
Flink org.apache.hadoop.Flink.PerformanceEvaluation
Flink org.apache.hadoop.Flink.PerformanceEvaluation--nomapred--rows=100000--presplit=100sequentialWrite 100
Flink org.apache.hadoop.Flink.PerformanceEvaluation--nomapred--rows=1000--presplit=100sequentialWrite 10
6. Generating a log:
log records the whole automatic test execution process to a log file.
# execution script
Log
Correspondingly, as shown in fig. 2, the invention also discloses an automatic testing system for high availability of a Flink assembly, comprising:
a state acquisition unit, configured to check a current state of the cluster JobManager;
the master-slave switching unit is used for forcibly switching the master-slave JobManager;
the verification unit is used for verifying the cluster function;
the fault simulation unit is used for simulating the fault of the JobManager node of the FLINK;
and the recording unit is used for generating a test log.
Correspondingly, the invention also discloses high-availability automatic test equipment for the Flink component, which comprises the following components: a memory for storing a computer program;
a processor for implementing the steps of the automatic testing method for high availability of the Flink assembly as described in any of the above when the computer program is executed.
Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied in the form of a software product, where the computer software product is stored in a storage medium, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like, and the storage medium can store program codes, and includes instructions for enabling a computer terminal (which may be a personal computer, a server, or a second terminal, a network terminal, and the like) to perform all or part of the steps of the method in the embodiments of the present invention. The same and similar parts in the various embodiments in this specification may be referred to each other. Especially, for the terminal embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant points can be referred to the description in the method embodiment.
In the embodiments provided by the present invention, it should be understood that the disclosed system, system and method can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit.
Similarly, each processing unit in the embodiments of the present invention may be integrated into one functional module, or each processing unit may exist physically, or two or more processing units are integrated into one functional module.
The invention is further described with reference to the accompanying drawings and specific embodiments. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and these equivalents also fall within the scope of the present application.

Claims (9)

1. A method for automatically testing high availability of a Flink component is characterized by comprising the following steps:
s1: checking the current state of the cluster JobManager;
s2: forcibly switching the main and standby JobManagers;
s3: after the forced switching occurs, carrying out cluster function verification;
s4: simulating a JobManager node fault of the FLINK;
s5: performing cluster function verification again;
s6: and generating a test log.
2. The method for automatically testing high availability of the Flink assembly according to claim 1, wherein the step S1 is specifically as follows:
and under the Flink user, acquiring the node of the JobManager in the Active state through a Flink command, and recording.
3. The method for automatically testing high availability of a Flink assembly according to claim 2, wherein the step S1 comprises:
calling a self script/usr/hdp/3.0.1.0-187/Flink/bin/get-Active-master. rb of the Flink to acquire a node where a JobManager with an Active state is located;
the number of the JobManager components is two, and the JobManager components comprise: JobManager1 and JobManager 2;
the state of the JobManager1 is obtained as Active, and the state of the JobManager2 is obtained as Backup.
4. The method for automatically testing high availability of a Flink assembly according to claim 3, wherein the step S2 comprises:
acquiring the master and standby states of the JobManager1 and the JobManager2 through the current JobManager state check of the cluster;
calling a transitionToActive () method to execute a command of Flink to forcibly switch the JobManager, switching the JobManager2 into Active and checking whether the switching of the JobManager2 node is successful;
calling the transitionToActive () method to execute the command of Flink to force the change of JobManager switches the JobManager1 to Backup and checks whether the JobManager1 node change was successful.
5. The method for automatically testing high availability of a Flink assembly according to claim 1, wherein the steps S3 and S5 each comprise:
job Job running MR performs cluster function normal verification by executing the commands of the TestDFSIO tool.
6. The method for automatically testing high availability of a Flink assembly according to claim 3, wherein the step S4 comprises:
acquiring the process number of the JobManager1 node, and killing the process through a kill-9 command;
it is checked whether the status of JobManager2 of Flink is automatically switched.
7. The method for automatically testing high availability of a Flink assembly according to claim 1, wherein the step S6 comprises:
and recording an automatic test execution process to a log file log by executing a script/automatic test _ FlinkHA.sh > log.
8. An automatic test system for high availability of Flink components, comprising:
a state acquisition unit, configured to check a current state of the cluster JobManager;
the master-slave switching unit is used for forcibly switching the master-slave JobManager;
the verification unit is used for verifying the cluster function;
the fault simulation unit is used for simulating the fault of the JobManager node of the FLINK;
and the recording unit is used for generating a test log.
9. An automatic test equipment for high availability of Flink components, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the method for automatic testing of the high availability of the Flink assembly according to any of the claims 1 to 7 when executing said computer program.
CN201911364067.2A 2019-12-26 2019-12-26 Automatic testing method, system and equipment for high availability of Flink component Withdrawn CN111177001A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911364067.2A CN111177001A (en) 2019-12-26 2019-12-26 Automatic testing method, system and equipment for high availability of Flink component

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911364067.2A CN111177001A (en) 2019-12-26 2019-12-26 Automatic testing method, system and equipment for high availability of Flink component

Publications (1)

Publication Number Publication Date
CN111177001A true CN111177001A (en) 2020-05-19

Family

ID=70648954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911364067.2A Withdrawn CN111177001A (en) 2019-12-26 2019-12-26 Automatic testing method, system and equipment for high availability of Flink component

Country Status (1)

Country Link
CN (1) CN111177001A (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532187A (en) * 2019-08-30 2019-12-03 苏州浪潮智能科技有限公司 A kind of HDFS throughput performance test method, system, terminal and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532187A (en) * 2019-08-30 2019-12-03 苏州浪潮智能科技有限公司 A kind of HDFS throughput performance test method, system, terminal and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
佚名: "Flink JobManager HA模式部署(基于Standalone)", 《HTTPS://WWW.CNBLOGS.COM/LIUGH/P/7482571.HTML?IVK_SA=1024320U》 *
王磊等: "高可用机群系统辅助测试工具", 《计算机工程与应用》 *

Similar Documents

Publication Publication Date Title
CN111625452B (en) Flow playback method and system
CN102880503B (en) Data analysis system and data analysis method
CN107451147B (en) Method and device for dynamically switching kafka clusters
CN110543328B (en) Cross-platform component management method, system, terminal and storage medium based on Ambari
CN105955878A (en) Server-side test method and system
CN107491371B (en) Deployment monitoring method and device
CN112631846A (en) Fault drilling method and device, computer equipment and storage medium
CN110750445A (en) Method, system and equipment for testing high-availability function of YARN component
CN111026602A (en) Health inspection scheduling management method and device of cloud platform and electronic equipment
CN111258913A (en) Automatic algorithm testing method and device, computer system and readable storage medium
Lauer et al. Engineering adaptive fault-tolerance mechanisms for resilient computing on ROS
CN111147331A (en) Server network card interaction test method, system, terminal and storage medium
CN109344059B (en) Server pressure testing method and device
CN111177001A (en) Automatic testing method, system and equipment for high availability of Flink component
CN111124772A (en) Cloud platform storage performance testing method, system, terminal and storage medium
CN115617668A (en) Compatibility testing method, device and equipment
CN114238091A (en) Resident interactive service cluster testing method and system
CN109144669A (en) The method for testing pressure and system of NAS virtual machine system under a kind of MCS system
CN111045923A (en) HBASE component high-availability test method, system and equipment
CN111338871A (en) Distributed file system Qzone high availability test method, system, equipment and storage medium
Szentiványi et al. Building and evaluating a fault-tolerant CORBA infrastructure
CN111966553B (en) SSD (solid State disk) testing method, system, terminal and storage medium based on use case pool
CN110704240A (en) Method, device, equipment and system for testing storage performance by using virtual machine
Pugdeethosapol et al. Dynamic configuration of the computing nodes of the ALICE O 2 system
CN111984512B (en) Storage system resource competition simulation test method, system, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200519