WO2020233252A1 - Method and apparatus for diagnosing spark application - Google Patents

Method and apparatus for diagnosing spark application Download PDF

Info

Publication number
WO2020233252A1
WO2020233252A1 PCT/CN2020/083381 CN2020083381W WO2020233252A1 WO 2020233252 A1 WO2020233252 A1 WO 2020233252A1 CN 2020083381 W CN2020083381 W CN 2020083381W WO 2020233252 A1 WO2020233252 A1 WO 2020233252A1
Authority
WO
WIPO (PCT)
Prior art keywords
diagnostic
index
diagnosis
spark application
spark
Prior art date
Application number
PCT/CN2020/083381
Other languages
French (fr)
Chinese (zh)
Inventor
王和平
尹强
刘有
黄山
杨峙岳
邸帅
卢道和
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2020233252A1 publication Critical patent/WO2020233252A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/366Software debugging using diagnostics

Definitions

  • the embodiments of the present application relate to the field of Fintech, and in particular, to a method and device for diagnosing Spark applications.
  • Spark technology is no exception.
  • Spark technology puts forward higher requirements.
  • Spark technology is a fast and general-purpose computing engine designed for large-scale data processing. Spark uses memory computing technology, which can analyze and compute in memory when the data has not been written to the hard disk.
  • the existing Spark application diagnosis is to collect and analyze the logs during the running process after the Spark application runs, determine the problems existing in the Spark application running based on the preset rule method, and make corresponding adjustments.
  • the embodiments of the application provide a method and device for diagnosing Spark applications, which are used to collect the running indicators of the Spark application in real time during the running process of the Spark application, perform real-time diagnosis of the problems in the running of the Spark application, and provide effective diagnosis Measures.
  • an embodiment of the present application provides a method for diagnosing a Spark application.
  • the method can be executed by a runtime diagnostic tool to which the method for diagnosing a Spark application is provided in the financial technology field, including: obtaining information about the Spark application Context information; determine the diagnostic indicators of the Spark application and the indicator rules corresponding to the diagnostic indicators according to the context information; collect the operating information corresponding to the diagnostic indicators of the Spark application during the running process according to the diagnostic indicators of the Spark application; diagnose according to the indicator rules corresponding to the diagnostic indicators Diagnose the running information corresponding to the indicator and determine the diagnosis result of the Spark application.
  • the operating information corresponding to the diagnostic indicator is diagnosed according to the indicator rules corresponding to the diagnostic indicator, and the diagnostic result corresponding to the diagnostic indicator is determined; the diagnostic result corresponding to the multiple diagnostic indicators
  • the diagnosis result that meets the preset rules is determined in the file and determined to be the diagnosis result of the Spark application.
  • a corresponding diagnosis result is set, and at the same time, a preset rule is used to determine a diagnosis result that meets the preset rule from a plurality of diagnosis results, and further determine it as Diagnosis result of Spark application.
  • Diagnose the running indicators of the Spark application in various aspects, evaluate the Spark application in multiple dimensions, and find the faults in the running in time, and according to preset rules, the diagnosis results corresponding to the representative diagnostic indicators are used as the diagnosis of the current Spark application
  • the diagnosis result of the Spark application further includes: according to the diagnosis code in the diagnosis result that meets the preset rule, obtain the diagnosis measure corresponding to the diagnosis code in the diagnosis result that meets the preset rule from the preset database And report to the user; the corresponding relationship between the diagnosis code and the diagnosis measures is preset in the preset database.
  • a preset database is provided, and the corresponding relationship between the diagnosis code and the diagnosis measure is preset in the preset database, so that after the diagnosis result of the Spark application is determined, the user can provide the targeted diagnosis measure, namely
  • the solution is convenient for users to solve problems independently based on diagnostic measures, so as to solve the problems in the running of the Spark application in time.
  • This technical solution does not require users to query related materials to solve the running problems of the Spark application, but directly sets up related solutions and provides them to users, improving the efficiency of users in solving problems, and improving user experience.
  • the operating indicators corresponding to the diagnostic indicators are generated; the operating indicators corresponding to the diagnostic indicators are diagnosed according to the indicator rules corresponding to the diagnostic indicators.
  • the operation information corresponding to the diagnostic index is unified and encapsulated into an operation index that can be diagnosed, so that the operation index is diagnosed.
  • the diagnostic index of the Spark application and the index rule corresponding to the diagnostic index according to the context information it also includes: obtaining user configuration information; determining the diagnostic index of the Spark application and the corresponding diagnostic index according to the user configuration information and context information Indicator rules.
  • the user is supported to select the diagnostic index and the index rule corresponding to the diagnostic index, that is, the user can choose the index collector and the diagnostic ruler to perform real-time diagnosis of the Spark job to meet the needs of different users.
  • the embodiments of the present application provide a device for diagnosing Spark applications.
  • the device may be the runtime diagnostic in the first aspect mentioned above, or a device including the aforementioned runtime diagnostic device, or a device with runtime diagnostics.
  • the device includes a module, unit, or means corresponding to the foregoing method, and the module, unit, or means can be implemented by hardware, software, or hardware executing corresponding software.
  • the hardware or software includes one or more modules or units corresponding to the above-mentioned functions.
  • the device includes: an acquisition unit for acquiring context information of the Spark application; a processing unit for determining the diagnostic index of the Spark application and the index rule corresponding to the diagnostic index according to the context information; and collecting the Spark application according to the diagnostic index of the Spark application Diagnose the running information corresponding to the diagnostic index during the running process; diagnose the running information corresponding to the diagnostic index according to the index rule corresponding to the diagnostic index, and determine the diagnosis result of the Spark application.
  • diagnosis indicators there are multiple diagnosis indicators; the processing unit is specifically used to: for any one diagnosis indicator, diagnose the operating information corresponding to the diagnosis indicator according to the indicator rule corresponding to the diagnosis indicator, and determine the diagnosis result corresponding to the diagnosis indicator;
  • the diagnosis result corresponding to the diagnosis index determines the diagnosis result that meets the preset rules, and is determined to be the diagnosis result of the Spark application.
  • the processing unit is further configured to: after determining the diagnosis result of the Spark application, according to the diagnosis code in the diagnosis result that meets the preset rule, obtain the diagnosis result that meets the preset rule from the preset database through the obtaining unit The diagnostic measures corresponding to the diagnostic code are reported to the user; wherein the corresponding relationship between the diagnostic code and the diagnostic measure is preset in the preset database.
  • the processing unit is specifically configured to: uniformly process the operating information corresponding to the diagnostic indicators to generate operating indicators corresponding to the diagnostic indicators; and diagnose the operating indicators corresponding to the diagnostic indicators according to the indicator rules corresponding to the diagnostic indicators.
  • the processing unit is further configured to: before determining the diagnostic index of the Spark application and the index rule corresponding to the diagnostic index according to the context information, obtain user configuration information through the obtaining unit; determine the diagnosis of the Spark application according to the user configuration information and context information Indicator rules corresponding to indicators and diagnostic indicators.
  • the present application also provides a computing device, including: a processor and a memory; the processor is configured to be coupled with the memory, and by calling and executing the memory stored in the memory storing computer programs or instructions, when the processor When the computer program or instruction is executed, the communication device can execute the method of the first aspect.
  • the computing device may be the operating diagnostic device in the first aspect described above, or a device including the operating diagnostic device, or a chip with corresponding functions of the operating diagnostic device, or the like.
  • the present application also provides a computer-readable non-volatile storage medium including computer-readable instructions.
  • the computer reads and executes the computer-readable instructions, the computer executes the above-mentioned method for diagnosing the Spark application.
  • this application provides a computer program product containing instructions, which when run on a computer, enables the computer to execute the method of the first aspect.
  • the technical effects brought by any one of the possible implementation manners of the foregoing second aspect to the fifth aspect may refer to the technical effects brought about by the different implementation manners of the foregoing first aspect, and details are not described herein again.
  • FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the application
  • FIG. 2 is a schematic flowchart of a method for diagnosing a Spark application provided by an embodiment of the application
  • FIG. 3 is a schematic structural diagram of a device for diagnosing Spark applications provided by an embodiment of the application.
  • FIG. 4 is a schematic structural diagram of a computing device provided by an embodiment of the application.
  • FIG. 1 exemplarily shows a runtime diagnostic device (Runtime Diagnoser) 100 applicable to the method for diagnosing Spark applications in the financial technology field provided by an embodiment of the present application.
  • the runtime diagnostic device 100 may include an indicator collector ( Metric Collector 101, Metric Ruler 102, Rule Result Merger 103, Diagnostic Notifer 104, Database 105; Run Diagnostics 100 is connected to Monitor 200 .
  • the running diagnostic device 100 is used for the scheduling of the entire Spark application in the diagnosis process. Specifically, the running diagnostic device 100 obtains the context information of the Spark application, instantiates the diagnostic context information (Diagnostic Context) according to the context information; registers according to the diagnostic context information
  • the metric collector 101 and the metric ruler 102 trigger the task of diagnosing the Spark application by timing or active triggering. That is, the metric collector 101 is triggered to collect the metric information of the Spark application during the running process according to the metric rules, and will collect
  • the received indicator information is sent to the indicator ruler 102, and the indicator ruler 102 generates rule results for the indicators according to the corresponding indicator rules.
  • the indicator ruler 102 sends the rule results to the rule result merger 103, and the rule result merger 103 receives
  • the multiple rule results obtained are generated, the diagnosis result of the Spark application is generated, and the diagnosis result is sent to the diagnosis notifier 104.
  • the diagnosis notifier 104 obtains the corresponding diagnosis measure from the database 105 according to the diagnosis result, and sends the diagnosis result and diagnosis measure to The monitor 200 enables the monitor 200 to display the diagnostic results and diagnostic measures of the Spark application in operation to the user.
  • FIG. 2 exemplarily shows the flow of a method for diagnosing a Spark application provided by an embodiment of the present application.
  • the flow can be executed by a device for diagnosing a Spark application, and the device can be located in the above-mentioned running diagnostic device, It is the above-mentioned running diagnostic tool.
  • the process specifically includes:
  • Step 201 Acquire context information of the Spark application.
  • the context information of the Spark application can include Spark Context, which plays a leading role in the execution of the Spark application. It is responsible for interacting with the program and the Spark cluster, including applying for cluster resources, creating RDD (Resilient Distributed Datasets, flexible distributed Data set), Accumulators (accumulator) and broadcast variables.
  • RDD Resilient Distributed Datasets, flexible distributed Data set
  • Accumulators accumulator
  • Step 202 Determine the diagnostic index of the Spark application and the index rule corresponding to the diagnostic index according to the context information.
  • the basic information of the Spark application is acquired according to the context information of the Spark application for index collection in the operation diagnosis. It can also be said that the diagnostic context information generates a metric collector and a diagnostic ruler through the Spark Context, and transmits the interfaces (Listener and/or Metrics) in the Spark Context to the metric collector for metric collection.
  • the user can also be supported to select the diagnostic index and the index rule corresponding to the diagnostic index, that is, the user can choose the index collector and the diagnostic ruler to perform real-time diagnosis of the Spark job.
  • the user configuration information can be obtained first, and then the diagnostic index of the Spark application and the index corresponding to the diagnostic index can be determined according to the user configuration information and context information rule.
  • Step 203 Collect operating information corresponding to the diagnostic indicators of the Spark application during the running process according to the diagnostic indicators of the Spark application.
  • the diagnostic indicators may include Task-related indicators, Executor-related indicators, and Job-related indicators.
  • the running information corresponding to the diagnostic indicators during the running of the Spark application may include the running information corresponding to the task-related indicators, such as the task execution time, the number of task attempts, the task start time, the number of task input records, the number of task output records, Task status, etc.; operation information corresponding to Executor-related indicators, such as the number of Executor parameter settings, the number of existing Executors, the number of Executor exits, the amount of data read, the amount of data output, etc.; the operation information corresponding to the Job-related indicators, such as Job running time, the total number of stages and the total number of successes of the job, the total number of tasks and the total number of successes of the job, etc.
  • the running information corresponding to the diagnostic index during the running of the Spark application can be collected according to the diagnostic index of the Spark application at a certain collection frequency. For example, during the running of the Spark application, it is set to collect the diagnostic index every 1 minute. Operating information.
  • the collection frequency can be set based on experience or according to user needs.
  • the collection frequency of different diagnostic indicators can be the same or different.
  • Step 204 Diagnose the operation information corresponding to the diagnostic index according to the index rule corresponding to the diagnostic index, and determine the diagnosis result of the Spark application.
  • the operating information corresponding to the diagnostic indicators may be unified to generate the operating indicators corresponding to the diagnostic indicators, and then the operating indicators corresponding to the diagnostic indicators can be diagnosed according to the indicator rules corresponding to the diagnostic indicators.
  • Unification processing can include unit unification, format conversion and other processing. After unification processing, the processed operating information is encapsulated into operating indicators corresponding to diagnostic indicators, so that the diagnostic ruler can diagnose the operating indicators corresponding to the diagnostic indicators. Obtain the diagnosis result of the Spark application.
  • the acquired diagnostic index can be multiple, and each diagnostic index corresponds to the operating index. After the diagnostic index corresponding to the operating index is diagnosed, the diagnostic result corresponding to each diagnostic index can be generated and determined from the diagnostic results corresponding to multiple diagnostic indicators A diagnosis result that meets the preset rules is output and determined as the diagnosis result of the Spark application.
  • This embodiment may have three scenarios, a data tilt scenario, a queue resource insufficient scenario, and a memory excess scenario, and diagnose the diagnostic indicators in the three scenarios, and generate a diagnostic result recording a diagnostic score.
  • Data skew scenario For example, when the execution time of a task is abnormal due to data skew at a certain stage, the diagnostic ruler will use the execution time of all tasks obtained from the Task indicator to take the median and maximum number of multiple execution times. If the maximum number is greater than ten times the median (parameters can be configured), get the number of input records of the task with the maximum execution time and the number of input records of the task with the median execution time, if the number of input records of the task with the maximum execution time Ten times the number of input records of the task with the median execution time (parameters are configurable), it is determined that there is data skew at this time, and the diagnosis score is determined by the multiple of the execution time and the multiple of the number of input records.
  • the current execution time of all tasks is 1min, 2min, 4min, 5min, 45min
  • the maximum execution time is 45min
  • the median execution time is 4min. If 45min is greater than ten times of 4min, the maximum execution time is further determined
  • the number of input records of the task and the number of input records of the task with the median execution time are assumed to be 300 and 40 respectively. Then it is determined that 300 is greater than ten times of 40, that is, there is a data skew at this time, and the execution time is
  • the diagnostic ruler will obtain the number of existing Executors and the number of set Executors from the executor indicators, and determine the current Whether the number of some Executors is less than 2/3 set by the user (parameters can be configured), if so, it is determined that there is insufficient queue resources at this time, and the diagnosis score is determined by the number of lacking Executors.
  • the diagnostic score for insufficient queue resources is 4 points.
  • the diagnostic ruler will obtain the number of existing Executors and the number of failed Executors from the actuator indicators. , And judge whether the number of failed Executors exceeds 1/4 of the existing number (parameters are configurable), if so, it is determined that there is a memory excess at this time, and the diagnosis score is determined by the number of failed Executors. For example, the number of existing Executors is 10, and the number of failed Executors is 3. At this time, there is a memory excess. Determine the score based on the number of failed Executors divided by 3, and the diagnostic score for the memory excess is 1 point .
  • diagnosis result corresponding to the diagnosis index with the highest diagnosis score is determined as the diagnosis result of the Spark application.
  • diagnosis score of data skew is 2 points
  • diagnosis score of queue resource shortage is 4 points
  • diagnosis score of memory excess is 1 point
  • diagnosis result of queue resource shortage is determined as the diagnosis result of Spark application .
  • the diagnosis result corresponding to the diagnosis index can include not only the diagnosis score of each diagnosis index, but also the diagnosis code and diagnosis information of each diagnosis index.
  • the diagnosis code for data skewing is d10001
  • the diagnosis information is the current task.
  • the maximum execution time is 45min
  • the median execution time is 4min
  • the corresponding input records are 300 and 40 respectively.
  • a preset database is also provided, and the corresponding relationship between the diagnostic code and the diagnostic measure (solution) is preset in the preset database.
  • the diagnosis code corresponding to the diagnosis code in the diagnosis result can be obtained from the preset database according to the diagnosis code in the diagnosis result, and the diagnosis measures are reported to the user.
  • a data tilt scenario will obtain a data tilt processing solution
  • a queue resource shortage scenario will obtain a queue resource shortage processing solution.
  • This technical solution can provide users with targeted diagnostic measures, that is, solutions, after the diagnosis results of the Spark application are determined, so that the users can solve problems autonomously according to the diagnostic measures, so as to solve the problems in the running of the Spark application in time. There is no need for users to query relevant information to solve the running problems of the Spark application, but directly set up relevant solutions and provide them to users, improve the efficiency of users in solving problems, and improve user experience.
  • the embodiments of this application can be applied to the field of financial technology (Fintech).
  • the field of financial technology refers to a new innovative technology brought to the financial field after information technology is integrated into the financial field. Financial operations are assisted by the use of advanced information technology. , Transaction execution and financial system improvement can improve the processing efficiency and business scale of the financial system, and can reduce costs and financial risks.
  • Spark can be used in the bank to do whitelist analysis and blacklist analysis of users, and ETL (Extract-transform-load, data extraction, cleaning, conversion, and loading) operations can be executed based on Spark in the bank.
  • ETL Extract-transform-load, data extraction, cleaning, conversion, and loading
  • FIG. 3 exemplarily shows the structure of a device for diagnosing a Spark application provided by an embodiment of the present application, and the device can execute the flow of the method for diagnosing a Spark application.
  • the device can exist in the form of software or hardware.
  • the device may include: a processing unit 302 and an acquiring unit 301.
  • the acquiring unit 301 may include a receiving unit, and the apparatus may also include a sending unit.
  • the processing unit 302 is used to control and manage the actions of the device.
  • the acquiring unit 301 and the sending unit are used to support communication between the device and other network entities.
  • the processing unit 302 may be a processor or a control device, for example, a general-purpose central processing unit (CPU), a general-purpose processor, a digital signal processing (digital signal processing, DSP), and an application specific integrated circuit (application specific integrated circuit). circuits, ASIC), field programmable gate array (FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It can implement or execute various exemplary logical blocks, modules and circuits described in conjunction with the disclosure of this application.
  • the processor may also be a combination that implements computing functions, for example, including a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and so on.
  • the acquisition unit 301 is an interface circuit of the device for receiving signals from other devices.
  • the acquisition unit 301 is an interface circuit for the chip to receive signals from other chips or devices
  • the sending unit is an interface circuit for the chip to send signals to other chips or devices.
  • the device may be the operating diagnostic device 100 in the above-mentioned embodiment, and may also be a chip for operating the diagnostic device 100.
  • the processing unit 302 may be a processor, for example, and the acquiring unit 301 may be a transceiver, for example.
  • the transceiver may include a radio frequency circuit
  • the storage unit may be, for example, a memory.
  • the processing unit 302 may be a processor, for example, and the acquiring unit 301 may be an input/output interface, a pin, or a circuit, for example.
  • the processing unit 302 can execute computer-executable instructions stored in the storage unit.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be located in the chip in the first forwarding server.
  • External storage units such as read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • the device is the operating diagnostic device 100 in the above-mentioned embodiment.
  • the obtaining unit 301 is used to obtain the context information of the Spark application;
  • the processing unit 302 is used to determine the diagnosis indicators of the Spark application and the indicator rules corresponding to the diagnosis indicators according to the context information; according to the diagnosis indicators of the Spark application, the obtaining unit 301 collects During the running of the Spark application, the running information corresponding to the diagnostic index is diagnosed; the running information corresponding to the diagnostic index is diagnosed according to the index rule corresponding to the diagnostic index, and the diagnosis result of the Spark application is determined.
  • the processing unit 302 is specifically configured to: for any one diagnosis indicator, diagnose the operating information corresponding to the diagnosis indicator according to the indicator rule corresponding to the diagnosis indicator, and determine the diagnosis result corresponding to the diagnosis indicator;
  • the diagnosis results corresponding to the three diagnosis indicators determine the diagnosis results that meet the preset rules, and are determined to be the diagnosis results of the Spark application.
  • the processing unit 302 is further configured to: after determining the diagnosis result of the Spark application, obtain the diagnosis code in the diagnosis result conforming to the preset rule from the preset database according to the diagnosis code in the diagnosis result conforming to the preset rule Corresponding diagnosis measures are reported to the user; wherein, the corresponding relationship between the diagnosis code and the diagnosis measure is preset in the preset database.
  • the processing unit 302 is specifically configured to: after uniformly processing the operating information corresponding to the diagnostic indicators, generate the operating indicators corresponding to the diagnostic indicators; and diagnose the operating indicators corresponding to the diagnostic indicators according to the indicator rules corresponding to the diagnostic indicators.
  • the processing unit 302 is further configured to: before determining the diagnostic index of the Spark application and the index rule corresponding to the diagnostic index according to the context information, obtain user configuration information through the obtaining unit 301; determine the Spark application according to the user configuration information and context information The diagnostic index and the index rule corresponding to the diagnostic index.
  • an embodiment of the present application further provides a computing device 400, and the computing device 400 may be the operating diagnostic device in the foregoing embodiment.
  • the computing device 400 includes a processor 402 and a communication interface 403.
  • the computing device 400 may further include a memory 401.
  • the computing device 400 may further include a communication line 404.
  • the communication interface 403, the processor 402, and the memory 401 may be connected to each other through a communication line 404;
  • the communication line 404 may be a peripheral component interconnect standard (peripheral component interconnect, PCI for short) bus or an extended industry standard architecture (extended industry standard architecture) , Referred to as EISA) bus and so on.
  • the communication line 404 can be divided into an address bus, a data bus, a control bus, and so on. For ease of presentation, only one thick line is used in FIG. 4 to represent, but it does not mean that there is only one bus or one type of bus.
  • the processor 402 may be a CPU, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the program of the present application.
  • the processor 402 may be used to: determine the diagnostic index of the Spark application and the index rule corresponding to the diagnostic index according to the context information; according to the diagnostic index of the Spark application, collect the running process of the Spark application through the communication interface 403 The operating information corresponding to the diagnostic indicator in the diagnostic indicator; the operating information corresponding to the diagnostic indicator is diagnosed according to the indicator rule corresponding to the diagnostic indicator, and the diagnosis result of the Spark application is determined.
  • the communication interface 403 uses any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area networks (WLAN), Wired access network, etc.
  • RAN radio access network
  • WLAN wireless local area networks
  • Wired access network etc.
  • the memory 401 may be a ROM or other types of static storage devices that can store static information and instructions, RAM or other types of dynamic storage devices that can store information and instructions, or an electrically erasable programmable read-only memory (electrically erasable programmable read-only memory).
  • read-only memory EEPROM
  • compact disc read-only memory, CD-ROM
  • optical disc storage including compact discs, laser discs, optical discs, digital universal discs, Blu-ray discs, etc.
  • magnetic disks A storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto.
  • the memory may exist independently, and is connected to the processor through a communication line 404. The memory can also be integrated with the processor.
  • the memory 401 is used to store computer-executed instructions for executing the solution of the present application, and the processor 402 controls the execution.
  • the processor 402 is configured to execute computer-executable instructions stored in the memory 401, so as to implement the method provided in the foregoing embodiment of the present application.
  • the computer-executable instructions in the embodiments of the present application may also be referred to as application program code, which is not specifically limited in the embodiments of the present application.
  • the embodiments of the present application also provide a computer-readable non-volatile storage medium, including computer-readable instructions.
  • the computer reads and executes the computer-readable instructions, the computer executes the above diagnostic Spark application. method.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.

Abstract

The present invention relates to the technical field of finances. Disclosed are a method and an apparatus for diagnosing a Spark application. The method comprises: obtaining context information of a Spark application; determining, according to the context information, a diagnostic indicator for the Spark application and an indicator rule corresponding to the diagnostic indicator; acquiring, according to the diagnostic indicator of the Spark application, running information corresponding to the diagnostic indicator during the running process of the Spark application; and diagnosing, according to the indicator rule corresponding to the diagnostic indicator, the running information corresponding to the diagnostic indicator to determine the diagnosis result of the Spark application. In the present technical solution, the running indicator of the Spark application is acquired in real time during the running process of the Spark application, real-time diagnosis is conducted for problems during the running of the Spark application, and an effective diagnostic measure is provided.

Description

一种诊断Spark应用的方法及装置Method and device for diagnosing Spark application
相关申请的交叉引用Cross references to related applications
本申请要求在2019年05月23日提交中国专利局、申请号为201910432603.1、申请名称为“一种诊断Spark应用的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on May 23, 2019, the application number is 201910432603.1, and the application name is "a method and device for diagnosing Spark applications", the entire content of which is incorporated herein by reference Applying.
技术领域Technical field
本申请实施例涉及金融科技(Fintech)领域,尤其涉及一种诊断Spark应用的方法及装置。The embodiments of the present application relate to the field of Fintech, and in particular, to a method and device for diagnosing Spark applications.
背景技术Background technique
随着计算机技术的发展,越来越多的技术应用在金融领域,传统金融业正在逐步向金融科技转变,Spark技术也不例外,但由于金融、支付行业的安全性、实时性要求,也对Spark技术提出的更高的要求。With the development of computer technology, more and more technologies are applied in the financial field. The traditional financial industry is gradually transforming to financial technology. Spark technology is no exception. However, due to the security and real-time requirements of the financial and payment industries, Spark technology puts forward higher requirements.
Spark技术是专为大规模数据处理而设计的快速通用的计算引擎,Spark使用了内存运算技术,能在数据尚未写入硬盘时在内存分析运算。现有的Spark应用诊断是在Spark应用运行完成后,收集运行过程中的日志进行分析,基于预设规则方法确定Spark应用在运行中存在的问题,从而进行相应的调整。Spark technology is a fast and general-purpose computing engine designed for large-scale data processing. Spark uses memory computing technology, which can analyze and compute in memory when the data has not been written to the hard disk. The existing Spark application diagnosis is to collect and analyze the logs during the running process after the Spark application runs, determine the problems existing in the Spark application running based on the preset rule method, and make corresponding adjustments.
现有技术中,在Spark应用运行完成后对日志进行诊断分析,不能及时发现运行过程中的问题,并提供有效诊断措施。In the prior art, the log is diagnosed and analyzed after the Spark application is running, and problems in the running process cannot be found in time and effective diagnostic measures are provided.
发明内容Summary of the invention
本申请实施例提供一种诊断Spark应用的方法及装置,用以实现在Spark应用运行过程中,实时采集Spark应用的运行指标,对Spark应用的运行中的问题进行实时诊断,并提供有效的诊断措施。The embodiments of the application provide a method and device for diagnosing Spark applications, which are used to collect the running indicators of the Spark application in real time during the running process of the Spark application, perform real-time diagnosis of the problems in the running of the Spark application, and provide effective diagnosis Measures.
第一方面,本申请实施例提供一种诊断Spark应用的方法,该方法可以通 过在金融科技领域中提供诊断Spark应用的方法所适用的运行诊断器(runtime diagnoser)执行,包括:获取Spark应用的上下文信息;根据上下文信息确定Spark应用的诊断指标和诊断指标对应的指标规则;根据Spark应用的诊断指标,采集Spark应用在运行过程中诊断指标对应的运行信息;根据诊断指标对应的指标规则对诊断指标对应的运行信息进行诊断,确定Spark应用的诊断结果。In the first aspect, an embodiment of the present application provides a method for diagnosing a Spark application. The method can be executed by a runtime diagnostic tool to which the method for diagnosing a Spark application is provided in the financial technology field, including: obtaining information about the Spark application Context information; determine the diagnostic indicators of the Spark application and the indicator rules corresponding to the diagnostic indicators according to the context information; collect the operating information corresponding to the diagnostic indicators of the Spark application during the running process according to the diagnostic indicators of the Spark application; diagnose according to the indicator rules corresponding to the diagnostic indicators Diagnose the running information corresponding to the indicator and determine the diagnosis result of the Spark application.
上述技术方案中,通过获取Spark应用的上下文信息,生成诊断指标和指标规则,实现在Spark应用过程中实时获取诊断指标对应的运行信息,并对运行信息进行诊断,确定出Spark应用的诊断结果,从而可以对Spark应用运行中出现的运行故障进行实时诊断,并获取到诊断结果,进一步的,在Spark应用运行中获取运行信息,可以较全面的采集到Spark应用运行中的参数指标等,相比于在Spark应用运行完成后的运行日志,运行参数更为全面且反应Spark应用的当前运行状态。In the above technical solution, by obtaining the context information of the Spark application, generating diagnostic indicators and indicator rules, it is possible to obtain real-time operating information corresponding to the diagnostic indicators during the Spark application process, diagnose the operating information, and determine the diagnostic result of the Spark application. In this way, it is possible to perform real-time diagnosis of operating faults that occur during the operation of Spark applications, and obtain the diagnosis results. Further, to obtain operating information during the operation of Spark applications, it is possible to more comprehensively collect the parameters and indicators of the operation of Spark applications. In the running log after the Spark application has finished running, the running parameters are more comprehensive and reflect the current running status of the Spark application.
可选的,诊断指标为多个;针对任一个诊断指标,根据诊断指标对应的指标规则对诊断指标对应的运行信息进行诊断,确定诊断指标对应的诊断结果;从多个诊断指标对应的诊断结果中确定出符合预设规则的诊断结果,并确定为Spark应用的诊断结果。Optionally, there are multiple diagnostic indicators; for any one of the diagnostic indicators, the operating information corresponding to the diagnostic indicator is diagnosed according to the indicator rules corresponding to the diagnostic indicator, and the diagnostic result corresponding to the diagnostic indicator is determined; the diagnostic result corresponding to the multiple diagnostic indicators The diagnosis result that meets the preset rules is determined in the file and determined to be the diagnosis result of the Spark application.
上述技术方案中,针对任一个诊断指标,都设置有与之相对应的诊断结果,同时预设规则,用于从多个诊断结果中确定出符合预设规则的诊断结果,进一步将其确定为Spark应用的诊断结果。从多方面诊断Spark应用在运行中的运行指标,多维度评估Spark应用,可以及时发现运行中的故障,且根据预设规则,将具有代表性的诊断指标对应的诊断结果作为当前Spark应用的诊断结果,从而方便的用户直观了解到当前Spark应用运行问题,提高用户体验。In the above technical solution, for any diagnosis index, a corresponding diagnosis result is set, and at the same time, a preset rule is used to determine a diagnosis result that meets the preset rule from a plurality of diagnosis results, and further determine it as Diagnosis result of Spark application. Diagnose the running indicators of the Spark application in various aspects, evaluate the Spark application in multiple dimensions, and find the faults in the running in time, and according to preset rules, the diagnosis results corresponding to the representative diagnostic indicators are used as the diagnosis of the current Spark application As a result, it is convenient for users to intuitively understand current Spark application running problems, and improve user experience.
可选的,在确定Spark应用的诊断结果之后,还包括:根据符合预设规则的诊断结果中的诊断码,从预设数据库中获取符合预设规则的诊断结果中的诊断码对应的诊断措施并报告给用户;预设数据库中预先设置有诊断码与诊断措施的对应关系。Optionally, after the diagnosis result of the Spark application is determined, it further includes: according to the diagnosis code in the diagnosis result that meets the preset rule, obtain the diagnosis measure corresponding to the diagnosis code in the diagnosis result that meets the preset rule from the preset database And report to the user; the corresponding relationship between the diagnosis code and the diagnosis measures is preset in the preset database.
上述技术方案中,提供预设数据库,该预设数据库中预先设置有诊断码与诊断措施的对应关系,从而可以在确定出Spark应用的诊断结果后,向用户提供有针对性的诊断措施,即解决方案,方便用户根据诊断措施自主解决问题,从而及时解决Spark应用在运行中的问题。该技术方案,无需用户查询相关资料用于解决Spark应用的运行问题,而是直接设置好相关的解决方案并提供给用户,提高用户解决问题的效率,且提高用户体验。In the above technical solution, a preset database is provided, and the corresponding relationship between the diagnosis code and the diagnosis measure is preset in the preset database, so that after the diagnosis result of the Spark application is determined, the user can provide the targeted diagnosis measure, namely The solution is convenient for users to solve problems independently based on diagnostic measures, so as to solve the problems in the running of the Spark application in time. This technical solution does not require users to query related materials to solve the running problems of the Spark application, but directly sets up related solutions and provides them to users, improving the efficiency of users in solving problems, and improving user experience.
可选的,将诊断指标对应的运行信息进行统一化处理后,生成诊断指标对应的运行指标;根据诊断指标对应的指标规则对诊断指标对应的运行指标进行诊断。Optionally, after the operating information corresponding to the diagnostic indicators is unified, the operating indicators corresponding to the diagnostic indicators are generated; the operating indicators corresponding to the diagnostic indicators are diagnosed according to the indicator rules corresponding to the diagnostic indicators.
上述技术方案中,将诊断指标对应的运行信息进行统一化处理,封装成可进行诊断的运行指标,从而对运行指标进行诊断。In the above technical solution, the operation information corresponding to the diagnostic index is unified and encapsulated into an operation index that can be diagnosed, so that the operation index is diagnosed.
可选的,在根据上下文信息确定Spark应用的诊断指标和诊断指标对应的指标规则之前,还包括:获取用户配置信息;根据用户配置信息、上下文信息,确定Spark应用的诊断指标和诊断指标对应的指标规则。Optionally, before determining the diagnostic index of the Spark application and the index rule corresponding to the diagnostic index according to the context information, it also includes: obtaining user configuration information; determining the diagnostic index of the Spark application and the corresponding diagnostic index according to the user configuration information and context information Indicator rules.
上述技术方案中,支持用户选择诊断指标和诊断指标对应的指标规则,即用户可以自行选择指标采集器和诊断规则器来对Spark作业进行实时诊断,满足不同用户的需求。In the above technical solution, the user is supported to select the diagnostic index and the index rule corresponding to the diagnostic index, that is, the user can choose the index collector and the diagnostic ruler to perform real-time diagnosis of the Spark job to meet the needs of different users.
第二方面,本申请实施例提供了一种诊断Spark应用的装置,该装置可以为上述第一方面中的运行诊断器(runtime diagnoser),或者包含上述运行诊断器的装置,或者为具有运行诊断器的相应功能的芯片等。该装置包括实现上述方法相应的模块、单元、或手段(means),该模块、单元、或means可以通过硬件实现,软件实现,或者通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块或单元。其中,该装置包括:获取单元,用于获取Spark应用的上下文信息;处理单元,用于根据上下文信息确定Spark应用的诊断指标和诊断指标对应的指标规则;根据Spark应用的诊断指标,采集Spark应用在运行过程中诊断指标对应的运行信息;根据诊断指标对应的指标规则对诊断指标对应的运行信息进行诊断,确定Spark应用的诊 断结果。In the second aspect, the embodiments of the present application provide a device for diagnosing Spark applications. The device may be the runtime diagnostic in the first aspect mentioned above, or a device including the aforementioned runtime diagnostic device, or a device with runtime diagnostics. The corresponding function of the chip, etc. The device includes a module, unit, or means corresponding to the foregoing method, and the module, unit, or means can be implemented by hardware, software, or hardware executing corresponding software. The hardware or software includes one or more modules or units corresponding to the above-mentioned functions. Among them, the device includes: an acquisition unit for acquiring context information of the Spark application; a processing unit for determining the diagnostic index of the Spark application and the index rule corresponding to the diagnostic index according to the context information; and collecting the Spark application according to the diagnostic index of the Spark application Diagnose the running information corresponding to the diagnostic index during the running process; diagnose the running information corresponding to the diagnostic index according to the index rule corresponding to the diagnostic index, and determine the diagnosis result of the Spark application.
可选的,诊断指标为多个;处理单元具体用于:针对任一个诊断指标,根据诊断指标对应的指标规则对诊断指标对应的运行信息进行诊断,确定诊断指标对应的诊断结果;从多个诊断指标对应的诊断结果中确定出符合预设规则的诊断结果,并确定为Spark应用的诊断结果。Optionally, there are multiple diagnosis indicators; the processing unit is specifically used to: for any one diagnosis indicator, diagnose the operating information corresponding to the diagnosis indicator according to the indicator rule corresponding to the diagnosis indicator, and determine the diagnosis result corresponding to the diagnosis indicator; The diagnosis result corresponding to the diagnosis index determines the diagnosis result that meets the preset rules, and is determined to be the diagnosis result of the Spark application.
可选的,处理单元还用于:在确定Spark应用的诊断结果之后,根据符合预设规则的诊断结果中的诊断码,通过获取单元从预设数据库中获取符合预设规则的诊断结果中的诊断码对应的诊断措施并报告给用户;其中,预设数据库中预先设置有诊断码与诊断措施的对应关系。Optionally, the processing unit is further configured to: after determining the diagnosis result of the Spark application, according to the diagnosis code in the diagnosis result that meets the preset rule, obtain the diagnosis result that meets the preset rule from the preset database through the obtaining unit The diagnostic measures corresponding to the diagnostic code are reported to the user; wherein the corresponding relationship between the diagnostic code and the diagnostic measure is preset in the preset database.
可选的,处理单元具体用于:将诊断指标对应的运行信息进行统一化处理后,生成诊断指标对应的运行指标;根据诊断指标对应的指标规则对诊断指标对应的运行指标进行诊断。Optionally, the processing unit is specifically configured to: uniformly process the operating information corresponding to the diagnostic indicators to generate operating indicators corresponding to the diagnostic indicators; and diagnose the operating indicators corresponding to the diagnostic indicators according to the indicator rules corresponding to the diagnostic indicators.
可选的,处理单元还用于:在根据上下文信息确定Spark应用的诊断指标和诊断指标对应的指标规则之前,通过获取单元获取用户配置信息;根据用户配置信息、上下文信息,确定Spark应用的诊断指标和诊断指标对应的指标规则。Optionally, the processing unit is further configured to: before determining the diagnostic index of the Spark application and the index rule corresponding to the diagnostic index according to the context information, obtain user configuration information through the obtaining unit; determine the diagnosis of the Spark application according to the user configuration information and context information Indicator rules corresponding to indicators and diagnostic indicators.
第三方面,本申请还提供了一种计算设备,包括:处理器和存储器;处理器,用于与存储器耦合,通过调用并执行存储器中存储的存储器存储有计算机程序或指令,当该处理器执行该计算机程序或指令时,以使该通信装置可以执行上述第一方面的方法。该计算设备可以为上述第一方面中的运行诊断器,或者包含上述运行诊断器的装置,或者为具有运行诊断器的相应功能的芯片等。In the third aspect, the present application also provides a computing device, including: a processor and a memory; the processor is configured to be coupled with the memory, and by calling and executing the memory stored in the memory storing computer programs or instructions, when the processor When the computer program or instruction is executed, the communication device can execute the method of the first aspect. The computing device may be the operating diagnostic device in the first aspect described above, or a device including the operating diagnostic device, or a chip with corresponding functions of the operating diagnostic device, or the like.
第四方面,本申请还提供了一种计算机可读非易失性存储介质,包括计算机可读指令,当计算机读取并执行计算机可读指令时,使得计算机执行上述诊断Spark应用的方法。In a fourth aspect, the present application also provides a computer-readable non-volatile storage medium including computer-readable instructions. When the computer reads and executes the computer-readable instructions, the computer executes the above-mentioned method for diagnosing the Spark application.
第五方面,本申请提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机可以执行上述第一方面的方法。In the fifth aspect, this application provides a computer program product containing instructions, which when run on a computer, enables the computer to execute the method of the first aspect.
其中,上述第二方面至第五方面中任一种可能的实现方式所带来的技术效果可参见上述第一方面中不同的实现方式所带来的技术效果,此处不再赘述。Among them, the technical effects brought by any one of the possible implementation manners of the foregoing second aspect to the fifth aspect may refer to the technical effects brought about by the different implementation manners of the foregoing first aspect, and details are not described herein again.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly explain the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, without creative work, other drawings can be obtained from these drawings.
图1为本申请实施例提供的一种系统架构的示意图;FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the application;
图2为本申请实施例提供的一种诊断Spark应用的方法的流程示意图;FIG. 2 is a schematic flowchart of a method for diagnosing a Spark application provided by an embodiment of the application;
图3为本申请实施例提供的一种诊断Spark应用的装置的结构示意图;3 is a schematic structural diagram of a device for diagnosing Spark applications provided by an embodiment of the application;
图4为本申请实施例提供的一种计算设备的结构示意图。FIG. 4 is a schematic structural diagram of a computing device provided by an embodiment of the application.
具体实施方式Detailed ways
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。In order to make the objectives, technical solutions, and advantages of the present invention clearer, the application will be further described in detail below in conjunction with the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative work shall fall within the protection scope of this application.
Spark提供了方便的监听器(Listener)和指标系统(Metrics)等API(Application Programming Interface,应用程序编程接口),用于收集Spark应用在运行过程中的运行信息。在此基础上,图1示例性的示出了本申请实施例在金融科技领域中提供诊断Spark应用的方法所适用运行诊断器(Runtime Diagnoser)100,该运行诊断器100可以包括指标收集器(Metric Collector)101、指标规则器(Metric Ruler)102、规则结果合并器(Rule Result Merger) 103、诊断通知器(Diagnostic Notifer)104、数据库105;运行诊断器100与监测器(Monitor)200相连接。Spark provides convenient APIs (Application Programming Interface) such as listeners and metrics systems, which are used to collect the running information of Spark applications during the running process. On this basis, FIG. 1 exemplarily shows a runtime diagnostic device (Runtime Diagnoser) 100 applicable to the method for diagnosing Spark applications in the financial technology field provided by an embodiment of the present application. The runtime diagnostic device 100 may include an indicator collector ( Metric Collector 101, Metric Ruler 102, Rule Result Merger 103, Diagnostic Notifer 104, Database 105; Run Diagnostics 100 is connected to Monitor 200 .
运行诊断器100用于整个Spark应用在诊断过程中的调度,具体的,运行诊断器100获取Spark应用的上下文信息,根据上下文信息实例化出诊断上下文信息(Diagnostic Context);根据诊断上下文信息,注册指标收集器101和指标规则器102,并通过定时触发或主动触发的方式,触发诊断Spark应用的任务,即触发指标收集器101根据指标规则收集Spark应用在运行过程中的指标信息,并将收集到的指标信息发送至指标规则器102,指标规则器102对指标按照相应的指标规则生成规则结果,随后,指标规则器102将规则结果发送至规则结果合并器103,规则结果合并器103根据接收到的多个规则结果,生成该Spark应用的诊断结果,将诊断结果发送至诊断通知器104,诊断通知器104根据诊断结果从数据库105中获取相应的诊断措施,将诊断结果和诊断措施发送至监测器200,以使监测器200将Spark应用在运行中的诊断结果和诊断措施展示给用户。The running diagnostic device 100 is used for the scheduling of the entire Spark application in the diagnosis process. Specifically, the running diagnostic device 100 obtains the context information of the Spark application, instantiates the diagnostic context information (Diagnostic Context) according to the context information; registers according to the diagnostic context information The metric collector 101 and the metric ruler 102 trigger the task of diagnosing the Spark application by timing or active triggering. That is, the metric collector 101 is triggered to collect the metric information of the Spark application during the running process according to the metric rules, and will collect The received indicator information is sent to the indicator ruler 102, and the indicator ruler 102 generates rule results for the indicators according to the corresponding indicator rules. Then, the indicator ruler 102 sends the rule results to the rule result merger 103, and the rule result merger 103 receives The multiple rule results obtained are generated, the diagnosis result of the Spark application is generated, and the diagnosis result is sent to the diagnosis notifier 104. The diagnosis notifier 104 obtains the corresponding diagnosis measure from the database 105 according to the diagnosis result, and sends the diagnosis result and diagnosis measure to The monitor 200 enables the monitor 200 to display the diagnostic results and diagnostic measures of the Spark application in operation to the user.
基于上述描述,图2示例性的示出了本申请实施例提供的一种诊断Spark应用的方法的流程,该流程可以由诊断Spark应用的装置执行,该装置可以位于上述运行诊断器中,可以是上述运行诊断器。如图2所示,该流程具体包括:Based on the above description, FIG. 2 exemplarily shows the flow of a method for diagnosing a Spark application provided by an embodiment of the present application. The flow can be executed by a device for diagnosing a Spark application, and the device can be located in the above-mentioned running diagnostic device, It is the above-mentioned running diagnostic tool. As shown in Figure 2, the process specifically includes:
步骤201,获取Spark应用的上下文信息。Step 201: Acquire context information of the Spark application.
此处,Spark应用在运行时,生成Spark应用的上下文信息,也可以说是Spark应用的诊断上下文信息。Spark应用的上下文信息可以包括Spark Context,Spark Context在Spark应用程序的执行过程中起着主导作用,它负责与程序和Spark集群进行交互,包括申请集群资源、创建RDD(Resilient Distributed Datasets,弹性分布式数据集)、Accumulators(累加器)及广播变量等。Here, when the Spark application is running, it generates the context information of the Spark application, which can also be said to be the diagnostic context information of the Spark application. The context information of the Spark application can include Spark Context, which plays a leading role in the execution of the Spark application. It is responsible for interacting with the program and the Spark cluster, including applying for cluster resources, creating RDD (Resilient Distributed Datasets, flexible distributed Data set), Accumulators (accumulator) and broadcast variables.
步骤202,根据上下文信息确定Spark应用的诊断指标和诊断指标对应的指标规则。Step 202: Determine the diagnostic index of the Spark application and the index rule corresponding to the diagnostic index according to the context information.
具体的,根据Spark应用的上下文信息获取Spark应用的基本信息用于在 运行诊断中进行指标采集。也可以说,诊断上下文信息通过Spark Context生成指标收集器、诊断规则器,以及将Spark Context里面的接口(Listener和/或Metrics)传输入指标收集器,用于指标收集。Specifically, the basic information of the Spark application is acquired according to the context information of the Spark application for index collection in the operation diagnosis. It can also be said that the diagnostic context information generates a metric collector and a diagnostic ruler through the Spark Context, and transmits the interfaces (Listener and/or Metrics) in the Spark Context to the metric collector for metric collection.
本申请实施例中,还可以支持用户选择诊断指标和诊断指标对应的指标规则,即用户可以自行选择指标采集器和诊断规则器来对Spark作业进行实时诊断。具体的,在根据上下文信息确定Spark应用的诊断指标和诊断指标对应的指标规则之前,可以先获取用户配置信息,随后根据用户配置信息、上下文信息,确定Spark应用的诊断指标和诊断指标对应的指标规则。In the embodiments of the present application, the user can also be supported to select the diagnostic index and the index rule corresponding to the diagnostic index, that is, the user can choose the index collector and the diagnostic ruler to perform real-time diagnosis of the Spark job. Specifically, before determining the diagnostic index of the Spark application and the index rule corresponding to the diagnostic index according to the context information, the user configuration information can be obtained first, and then the diagnostic index of the Spark application and the index corresponding to the diagnostic index can be determined according to the user configuration information and context information rule.
步骤203,根据Spark应用的诊断指标,采集Spark应用在运行过程中诊断指标对应的运行信息。Step 203: Collect operating information corresponding to the diagnostic indicators of the Spark application during the running process according to the diagnostic indicators of the Spark application.
诊断指标可以包括Task相关指标、Executor相关指标、Job相关指标。示例性的,Spark应用在运行过程中诊断指标对应的运行信息可以有,Task相关指标对应的运行信息,如Task执行时长、Task尝试次数、Task启动时间、Task输入记录数、Task输出记录数、Task状态等;Executor相关指标对应的运行信息,如Executor参数设置个数、现有的Executor个数、Executor退出个数、数据读取量、数据输出量等;Job相关指标对应的运行信息,如Job运行时长、Job的Stage总数和成功的总数、Job的Task总数和成功的总数等。The diagnostic indicators may include Task-related indicators, Executor-related indicators, and Job-related indicators. Exemplarily, the running information corresponding to the diagnostic indicators during the running of the Spark application may include the running information corresponding to the task-related indicators, such as the task execution time, the number of task attempts, the task start time, the number of task input records, the number of task output records, Task status, etc.; operation information corresponding to Executor-related indicators, such as the number of Executor parameter settings, the number of existing Executors, the number of Executor exits, the amount of data read, the amount of data output, etc.; the operation information corresponding to the Job-related indicators, such as Job running time, the total number of stages and the total number of successes of the job, the total number of tasks and the total number of successes of the job, etc.
本申请实施例中,可以根据Spark应用的诊断指标,按照一定的采集频率采集Spark应用在运行过程中诊断指标对应的运行信息,如在Spark应用运行过程中,设定每隔1min采集诊断指标对应的运行信息。采集频率可以根据经验进行设定,也可以根据用户需求设定。不同诊断指标的采集频率可以相同或不同。In the embodiment of the application, the running information corresponding to the diagnostic index during the running of the Spark application can be collected according to the diagnostic index of the Spark application at a certain collection frequency. For example, during the running of the Spark application, it is set to collect the diagnostic index every 1 minute. Operating information. The collection frequency can be set based on experience or according to user needs. The collection frequency of different diagnostic indicators can be the same or different.
步骤204,根据诊断指标对应的指标规则对诊断指标对应的运行信息进行诊断,确定Spark应用的诊断结果。Step 204: Diagnose the operation information corresponding to the diagnostic index according to the index rule corresponding to the diagnostic index, and determine the diagnosis result of the Spark application.
本申请实施例中,可以将诊断指标对应的运行信息进行统一化处理后,生成诊断指标对应的运行指标,再根据诊断指标对应的指标规则对诊断指标对应的运行指标进行诊断。统一化处理可以包括单位统一化、格式转换等处 理,统一化处理后,将处理后的运行信息封装为诊断指标对应的运行指标,以使得诊断规则器对诊断指标对应的运行指标进行诊断,从而获取到Spark应用的诊断结果。In the embodiments of the present application, the operating information corresponding to the diagnostic indicators may be unified to generate the operating indicators corresponding to the diagnostic indicators, and then the operating indicators corresponding to the diagnostic indicators can be diagnosed according to the indicator rules corresponding to the diagnostic indicators. Unification processing can include unit unification, format conversion and other processing. After unification processing, the processed operating information is encapsulated into operating indicators corresponding to diagnostic indicators, so that the diagnostic ruler can diagnose the operating indicators corresponding to the diagnostic indicators. Obtain the diagnosis result of the Spark application.
获取到的诊断指标可以为多个,各诊断指标对应运行指标,在对各诊断指标对应运行指标进行诊断后,可以生成各诊断指标对应的诊断结果,从多个诊断指标对应的诊断结果中确定出符合预设规则的诊断结果,并确定为Spark应用的诊断结果。The acquired diagnostic index can be multiple, and each diagnostic index corresponds to the operating index. After the diagnostic index corresponding to the operating index is diagnosed, the diagnostic result corresponding to each diagnostic index can be generated and determined from the diagnostic results corresponding to multiple diagnostic indicators A diagnosis result that meets the preset rules is output and determined as the diagnosis result of the Spark application.
为了更好的解释上述生成Spark应用诊断结果的实现方式,以下提供在具体实现场景下的实施例。该实施例可以有三种场景,数据倾斜场景、队列资源不足场景和内存超出场景,并对三种场景下的诊断指标进行诊断,生成记录有诊断评分的诊断结果。In order to better explain the above-mentioned implementation of generating Spark application diagnosis results, the following provides examples in specific implementation scenarios. This embodiment may have three scenarios, a data tilt scenario, a queue resource insufficient scenario, and a memory excess scenario, and diagnose the diagnostic indicators in the three scenarios, and generate a diagnostic result recording a diagnostic score.
数据倾斜场景:如当某个阶段由于数据倾斜导致Task的执行时长异常时,诊断规则器会通过从Task指标中获取的所有Task的执行时长,取多个执行时长的中位数和最大数,若最大数大于中位数的十倍(参数可配置),则获取最大执行时长的Task的输入记录数、中位数执行时长的Task的输入记录数,若最大执行时长的Task的输入记录数仍是中位数执行时长的Task的输入记录数的十倍(参数可配置),则确定此时存在数据倾斜情况,且通过执行时长的倍数和输入记录数的倍数确定诊断评分。例如,当前所有Task的执行时长为1min、2min、4min、5min、45min,则确定出最大执行时长为45min,中位数执行时长为4min,确定45min大于4min的十倍,则进一步确定最大执行时长的Task的输入记录数、中位数执行时长的Task的输入记录数,假设分别为300条和40条,则确定300条大于40条的十倍,即此时存在数据倾斜情况,通过执行时长确定打分为1分,通过输入记录数确定打分为1分,则数据倾斜情况的诊断评分为1+1=2分。Data skew scenario: For example, when the execution time of a task is abnormal due to data skew at a certain stage, the diagnostic ruler will use the execution time of all tasks obtained from the Task indicator to take the median and maximum number of multiple execution times. If the maximum number is greater than ten times the median (parameters can be configured), get the number of input records of the task with the maximum execution time and the number of input records of the task with the median execution time, if the number of input records of the task with the maximum execution time Ten times the number of input records of the task with the median execution time (parameters are configurable), it is determined that there is data skew at this time, and the diagnosis score is determined by the multiple of the execution time and the multiple of the number of input records. For example, if the current execution time of all tasks is 1min, 2min, 4min, 5min, 45min, it is determined that the maximum execution time is 45min, and the median execution time is 4min. If 45min is greater than ten times of 4min, the maximum execution time is further determined The number of input records of the task and the number of input records of the task with the median execution time are assumed to be 300 and 40 respectively. Then it is determined that 300 is greater than ten times of 40, that is, there is a data skew at this time, and the execution time is The score is determined to be 1 point, and the score is determined to be 1 point by inputting the number of records, and the diagnostic score of the data skew condition is 1+1=2 points.
队列资源不足场景:如当队列没有资源导致执行器个数远小于用户设置的值时,诊断规则器会通过从执行器指标中拿到现有Executor个数和设置的Executor个数,通过判断现有的Executor个数是否小于用户设置的2/3(参数 可配置),若是,则确定此时存在队列资源不足情况,且通过欠缺的Executor数确定诊断评分。例如,现有的Executor个数为10个,用户设置的Executor个数为30个,则现有的Executor个数小于用户设置的2/3,则此时存在队列资源不足情况,并通过欠缺的Executor数除以5进行打分,当前欠缺的Executor数为20个,则队列资源不足情况的诊断评分为4分。Scenarios of insufficient queue resources: For example, when the queue has no resources and the number of executors is much smaller than the value set by the user, the diagnostic ruler will obtain the number of existing Executors and the number of set Executors from the executor indicators, and determine the current Whether the number of some Executors is less than 2/3 set by the user (parameters can be configured), if so, it is determined that there is insufficient queue resources at this time, and the diagnosis score is determined by the number of lacking Executors. For example, if the number of existing Executors is 10, the number of Executors set by the user is 30, and the number of existing Executors is less than 2/3 of the user setting, there is insufficient queue resources at this time, and the lack of Divide the number of Executors by 5 for scoring. If the number of Executors currently lacking is 20, the diagnostic score for insufficient queue resources is 4 points.
内存超出场景:如当用户执行大数据量的复杂查询时,多个Executor由于内存超出退出的情况时,诊断规则器会通过从执行器指标中拿到现有Executor个数和失败的Executor个数,并通过判断失败的Executor个数是否超出现有个数的1/4(参数可配置),若是,则确定此时存在内存超出情况,且通过失败的Executor数确定诊断评分。例如,现有的Executor个数为10个,失败的Executor个数为3个,此时存在内存超出情况,确定根据失败的Executor数除以3进行打分,则内存超出情况的诊断评分为1分。Out-of-memory scenarios: For example, when a user executes a complex query with a large amount of data, and multiple Executors exit due to memory excess, the diagnostic ruler will obtain the number of existing Executors and the number of failed Executors from the actuator indicators. , And judge whether the number of failed Executors exceeds 1/4 of the existing number (parameters are configurable), if so, it is determined that there is a memory excess at this time, and the diagnosis score is determined by the number of failed Executors. For example, the number of existing Executors is 10, and the number of failed Executors is 3. At this time, there is a memory excess. Determine the score based on the number of failed Executors divided by 3, and the diagnostic score for the memory excess is 1 point .
根据上述各诊断指标对应的诊断结果中的诊断评分,可以确定将诊断评分最高的诊断指标对应的诊断结果确定为Spark应用的诊断结果。上述例子中,数据倾斜情况的诊断评分2分、队列资源不足情况的诊断评分为4分、内存超出情况的诊断评分为1分,则将队列资源不足情况的诊断结果确定为Spark应用的诊断结果。According to the diagnosis score in the diagnosis result corresponding to each of the above-mentioned diagnosis indexes, it can be determined that the diagnosis result corresponding to the diagnosis index with the highest diagnosis score is determined as the diagnosis result of the Spark application. In the above example, the diagnosis score of data skew is 2 points, the diagnosis score of queue resource shortage is 4 points, and the diagnosis score of memory excess is 1 point, then the diagnosis result of queue resource shortage is determined as the diagnosis result of Spark application .
此外,诊断指标对应的诊断结果不仅可以包括各诊断指标的诊断评分,还可以包括各诊断指标的诊断码、诊断信息等,例如,数据倾斜情况的诊断码为d10001,诊断信息为当前的Task的最大执行时长是45min,中位数执行时长为4min,对应的输入记录数分别为300条和40条。In addition, the diagnosis result corresponding to the diagnosis index can include not only the diagnosis score of each diagnosis index, but also the diagnosis code and diagnosis information of each diagnosis index. For example, the diagnosis code for data skewing is d10001, and the diagnosis information is the current task. The maximum execution time is 45min, the median execution time is 4min, and the corresponding input records are 300 and 40 respectively.
当然,本申请实施例中,还可以采用其他预设规则,从多个诊断指标对应的诊断结果中确定符合预设规则的诊断结果,并确定为Spark应用的诊断结果,其他预设规则如取诊断评分中最小值对应的诊断结果,或取诊断码中权重最大的诊断码对应的诊断结果等,当然,也可以根据多个诊断指标对应的诊断结果,综合权重或者其他参数确定出Spark应用的诊断结果,在此不做限制。Of course, in the embodiments of the present application, other preset rules may also be used to determine the diagnosis result that meets the preset rule from the diagnosis results corresponding to multiple diagnosis indicators, and determine it as the diagnosis result of the Spark application. Other preset rules such as The diagnosis result corresponding to the minimum value in the diagnosis score, or the diagnosis result corresponding to the diagnosis code with the largest weight in the diagnosis code, etc. Of course, it is also possible to determine the Spark application based on the diagnosis results corresponding to multiple diagnosis indicators, comprehensive weights or other parameters The diagnosis result is not limited here.
本申请实施例中,还设置有预设数据库,该预设数据库中预先设置有诊断码与诊断措施(解决方案)的对应关系。在确定出Spark应用的诊断结果之后,还可以根据上述诊断结果中的诊断码,从预设数据库中获取与诊断结果中的诊断码相对应的诊断措施,从而报告给用户。举例来说,数据倾斜场景会获取到数据倾斜处理解决方案,队列资源不足场景会获取到队列资源不足处理解决方案。该技术方案,可以在确定出Spark应用的诊断结果后,向用户提供有针对性的诊断措施,即解决方案,方便用户根据诊断措施自主解决问题,从而及时解决Spark应用在运行中的问题。无需用户查询相关资料用于解决Spark应用的运行问题,而是直接设置好相关的解决方案并提供给用户,提高用户解决问题的效率,且提高用户体验。In the embodiment of the present application, a preset database is also provided, and the corresponding relationship between the diagnostic code and the diagnostic measure (solution) is preset in the preset database. After the diagnosis result of the Spark application is determined, the diagnosis code corresponding to the diagnosis code in the diagnosis result can be obtained from the preset database according to the diagnosis code in the diagnosis result, and the diagnosis measures are reported to the user. For example, a data tilt scenario will obtain a data tilt processing solution, and a queue resource shortage scenario will obtain a queue resource shortage processing solution. This technical solution can provide users with targeted diagnostic measures, that is, solutions, after the diagnosis results of the Spark application are determined, so that the users can solve problems autonomously according to the diagnostic measures, so as to solve the problems in the running of the Spark application in time. There is no need for users to query relevant information to solve the running problems of the Spark application, but directly set up relevant solutions and provide them to users, improve the efficiency of users in solving problems, and improve user experience.
本申请实施例可应用于金融科技(Fintech)领域,金融科技领域是指将信息技术融入金融领域后,为金融领域带来的一种新的创新科技,通过使用先进的信息技术辅助实现金融作业、交易执行以及金融系统改进,可以提升金融系统的处理效率、业务规模,并可以降低成本和金融风险。示例性的,可以在银行中使用Spark做用户的白名单分析和黑名单分析,可以在银行中基于Spark执行ETL(Extract-transform-load,数据抽取、清洗、转换、装载)作业,在执行各种Spark应用时,可以执行Spark运行过程中的诊断,进行实时诊断以便能够实时的监测Spark应用运行的正常性。The embodiments of this application can be applied to the field of financial technology (Fintech). The field of financial technology refers to a new innovative technology brought to the financial field after information technology is integrated into the financial field. Financial operations are assisted by the use of advanced information technology. , Transaction execution and financial system improvement can improve the processing efficiency and business scale of the financial system, and can reduce costs and financial risks. Exemplarily, Spark can be used in the bank to do whitelist analysis and blacklist analysis of users, and ETL (Extract-transform-load, data extraction, cleaning, conversion, and loading) operations can be executed based on Spark in the bank. When using a Spark application, you can perform diagnostics during Spark running, and perform real-time diagnostics to monitor the normality of the Spark application in real time.
上述技术方案中,通过获取Spark应用的上下文信息,生成诊断指标和指标规则,实现在Spark应用过程中实时获取诊断指标对应的运行信息,并对运行信息进行诊断,确定出Spark应用的诊断结果,从而可以对Spark应用运行中出现的运行故障进行实时诊断,并获取到诊断结果,进一步的,在Spark应用运行中获取运行信息,可以较全面的采集到Spark应用运行中的参数指标等,相比于在Spark应用运行完成后的运行日志,运行参数更为全面且反应Spark应用的当前运行状态。In the above technical solution, by obtaining the context information of the Spark application, generating diagnostic indicators and indicator rules, it is possible to obtain real-time operating information corresponding to the diagnostic indicators during the Spark application process, diagnose the operating information, and determine the diagnostic result of the Spark application. In this way, it is possible to perform real-time diagnosis of operating faults that occur during the operation of Spark applications, and obtain the diagnosis results. Further, to obtain operating information during the operation of Spark applications, it is possible to more comprehensively collect the parameters and indicators of the operation of Spark applications. In the running log after the Spark application has finished running, the running parameters are more comprehensive and reflect the current running status of the Spark application.
基于同一构思,图3示例性的示出了本申请实施例提供的一种诊断Spark应用的装置的结构,该装置可以执行诊断Spark应用的方法的流程。该装置可 以以软件或硬件的形式存在。装置可以包括:处理单元302和获取单元301。作为一种实现方式,该获取单元301可以包括接收单元,装置还可以包括发送单元。处理单元302用于对装置的动作进行控制管理。获取单元301和发送单元用于支持装置与其他网络实体的通信。Based on the same concept, FIG. 3 exemplarily shows the structure of a device for diagnosing a Spark application provided by an embodiment of the present application, and the device can execute the flow of the method for diagnosing a Spark application. The device can exist in the form of software or hardware. The device may include: a processing unit 302 and an acquiring unit 301. As an implementation manner, the acquiring unit 301 may include a receiving unit, and the apparatus may also include a sending unit. The processing unit 302 is used to control and manage the actions of the device. The acquiring unit 301 and the sending unit are used to support communication between the device and other network entities.
其中,处理单元302可以是处理器或控制设备,例如可以是通用中央处理器(central processing unit,CPU),通用处理器,数字信号处理(digital signal processing,DSP),专用集成电路(application specific integrated circuits,ASIC),现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器也可以是实现计算功能的组合,例如包括一个或多个微处理器组合,DSP和微处理器的组合等等。获取单元301是一种该装置的接口电路,用于从其它装置接收信号。例如,当该装置以芯片的方式实现时,该获取单元301是该芯片用于从其它芯片或装置接收信号的接口电路,发送单元是该芯片用于向其它芯片或装置发送信号的接口电路。The processing unit 302 may be a processor or a control device, for example, a general-purpose central processing unit (CPU), a general-purpose processor, a digital signal processing (digital signal processing, DSP), and an application specific integrated circuit (application specific integrated circuit). circuits, ASIC), field programmable gate array (FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It can implement or execute various exemplary logical blocks, modules and circuits described in conjunction with the disclosure of this application. The processor may also be a combination that implements computing functions, for example, including a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and so on. The acquisition unit 301 is an interface circuit of the device for receiving signals from other devices. For example, when the device is implemented as a chip, the acquisition unit 301 is an interface circuit for the chip to receive signals from other chips or devices, and the sending unit is an interface circuit for the chip to send signals to other chips or devices.
该装置可以为上述实施例中的运行诊断器100,还可以为用于运行诊断器100的芯片。例如,当装置为运行诊断器100时,该处理单元302例如可以是处理器,该获取单元301例如可以是收发器。可选的,该收发器可以包括射频电路,该存储单元例如可以是存储器。例如,当装置为用于运行诊断器100的芯片时,该处理单元302例如可以是处理器,该获取单元301例如可以是输入/输出接口、管脚或电路等。该处理单元302可执行存储单元存储的计算机执行指令,可选地,该存储单元为该芯片内的存储单元,如寄存器、缓存等,该存储单元还可以是第一转发服务器内的位于该芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。The device may be the operating diagnostic device 100 in the above-mentioned embodiment, and may also be a chip for operating the diagnostic device 100. For example, when the device is the operating diagnostic device 100, the processing unit 302 may be a processor, for example, and the acquiring unit 301 may be a transceiver, for example. Optionally, the transceiver may include a radio frequency circuit, and the storage unit may be, for example, a memory. For example, when the device is a chip for running the diagnostic device 100, the processing unit 302 may be a processor, for example, and the acquiring unit 301 may be an input/output interface, a pin, or a circuit, for example. The processing unit 302 can execute computer-executable instructions stored in the storage unit. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be located in the chip in the first forwarding server. External storage units, such as read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
在一种实施例中,该装置为上述实施例中的运行诊断器100。其中,获取单元301,用于获取Spark应用的上下文信息;处理单元302,用于根据上下 文信息确定Spark应用的诊断指标和诊断指标对应的指标规则;根据Spark应用的诊断指标,通过获取单元301采集Spark应用在运行过程中诊断指标对应的运行信息;根据诊断指标对应的指标规则对诊断指标对应的运行信息进行诊断,确定Spark应用的诊断结果。In one embodiment, the device is the operating diagnostic device 100 in the above-mentioned embodiment. Among them, the obtaining unit 301 is used to obtain the context information of the Spark application; the processing unit 302 is used to determine the diagnosis indicators of the Spark application and the indicator rules corresponding to the diagnosis indicators according to the context information; according to the diagnosis indicators of the Spark application, the obtaining unit 301 collects During the running of the Spark application, the running information corresponding to the diagnostic index is diagnosed; the running information corresponding to the diagnostic index is diagnosed according to the index rule corresponding to the diagnostic index, and the diagnosis result of the Spark application is determined.
可选的,诊断指标为多个;处理单元302具体用于:针对任一个诊断指标,根据诊断指标对应的指标规则对诊断指标对应的运行信息进行诊断,确定诊断指标对应的诊断结果;从多个诊断指标对应的诊断结果中确定出符合预设规则的诊断结果,并确定为Spark应用的诊断结果。Optionally, there are multiple diagnosis indicators; the processing unit 302 is specifically configured to: for any one diagnosis indicator, diagnose the operating information corresponding to the diagnosis indicator according to the indicator rule corresponding to the diagnosis indicator, and determine the diagnosis result corresponding to the diagnosis indicator; The diagnosis results corresponding to the three diagnosis indicators determine the diagnosis results that meet the preset rules, and are determined to be the diagnosis results of the Spark application.
可选的,处理单元302还用于:在确定Spark应用的诊断结果之后,根据符合预设规则的诊断结果中的诊断码,从预设数据库中获取符合预设规则的诊断结果中的诊断码对应的诊断措施并报告给用户;其中,预设数据库中预先设置有诊断码与诊断措施的对应关系。Optionally, the processing unit 302 is further configured to: after determining the diagnosis result of the Spark application, obtain the diagnosis code in the diagnosis result conforming to the preset rule from the preset database according to the diagnosis code in the diagnosis result conforming to the preset rule Corresponding diagnosis measures are reported to the user; wherein, the corresponding relationship between the diagnosis code and the diagnosis measure is preset in the preset database.
可选的,处理单元302具体用于:将诊断指标对应的运行信息进行统一化处理后,生成诊断指标对应的运行指标;根据诊断指标对应的指标规则对诊断指标对应的运行指标进行诊断。Optionally, the processing unit 302 is specifically configured to: after uniformly processing the operating information corresponding to the diagnostic indicators, generate the operating indicators corresponding to the diagnostic indicators; and diagnose the operating indicators corresponding to the diagnostic indicators according to the indicator rules corresponding to the diagnostic indicators.
可选的,处理单元302还用于:在根据上下文信息确定Spark应用的诊断指标和诊断指标对应的指标规则之前,通过获取单元301获取用户配置信息;根据用户配置信息、上下文信息,确定Spark应用的诊断指标和诊断指标对应的指标规则。Optionally, the processing unit 302 is further configured to: before determining the diagnostic index of the Spark application and the index rule corresponding to the diagnostic index according to the context information, obtain user configuration information through the obtaining unit 301; determine the Spark application according to the user configuration information and context information The diagnostic index and the index rule corresponding to the diagnostic index.
基于同一构思,如图4所示,本申请实施例还提供了一种计算设备400,该计算设备400可以是上述实施例中的运行诊断器。计算设备400包括:处理器402和通信接口403,可选的,计算设备400还可以包括存储器401。可选的,计算设备400还可以包括通信线路404。其中,通信接口403、处理器402以及存储器401可以通过通信线路404相互连接;通信线路404可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。通信线路404可以分为地址总线、数据总线、控制总线等。为便于表示,图4 中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。Based on the same concept, as shown in FIG. 4, an embodiment of the present application further provides a computing device 400, and the computing device 400 may be the operating diagnostic device in the foregoing embodiment. The computing device 400 includes a processor 402 and a communication interface 403. Optionally, the computing device 400 may further include a memory 401. Optionally, the computing device 400 may further include a communication line 404. Among them, the communication interface 403, the processor 402, and the memory 401 may be connected to each other through a communication line 404; the communication line 404 may be a peripheral component interconnect standard (peripheral component interconnect, PCI for short) bus or an extended industry standard architecture (extended industry standard architecture) , Referred to as EISA) bus and so on. The communication line 404 can be divided into an address bus, a data bus, a control bus, and so on. For ease of presentation, only one thick line is used in FIG. 4 to represent, but it does not mean that there is only one bus or one type of bus.
处理器402可以是一个CPU,微处理器,ASIC,或一个或多个用于控制本申请方案程序执行的集成电路。The processor 402 may be a CPU, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the program of the present application.
一种可能的实施例中,处理器402,可以用于:根据上下文信息确定Spark应用的诊断指标和诊断指标对应的指标规则;根据Spark应用的诊断指标,通过通信接口403采集Spark应用在运行过程中诊断指标对应的运行信息;根据诊断指标对应的指标规则对诊断指标对应的运行信息进行诊断,确定Spark应用的诊断结果。In a possible embodiment, the processor 402 may be used to: determine the diagnostic index of the Spark application and the index rule corresponding to the diagnostic index according to the context information; according to the diagnostic index of the Spark application, collect the running process of the Spark application through the communication interface 403 The operating information corresponding to the diagnostic indicator in the diagnostic indicator; the operating information corresponding to the diagnostic indicator is diagnosed according to the indicator rule corresponding to the diagnostic indicator, and the diagnosis result of the Spark application is determined.
通信接口403,使用任何收发器一类的装置,用于与其他设备或通信网络通信,如以太网,无线接入网(radio access network,RAN),无线局域网(wireless local area networks,WLAN),有线接入网等。The communication interface 403 uses any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area networks (WLAN), Wired access network, etc.
存储器401可以是ROM或可存储静态信息和指令的其他类型的静态存储设备,RAM或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过通信线路404与处理器相连接。存储器也可以和处理器集成在一起。The memory 401 may be a ROM or other types of static storage devices that can store static information and instructions, RAM or other types of dynamic storage devices that can store information and instructions, or an electrically erasable programmable read-only memory (electrically erasable programmable read-only memory). read-only memory, EEPROM), compact disc (read-only memory, CD-ROM) or other optical disc storage, optical disc storage (including compact discs, laser discs, optical discs, digital universal discs, Blu-ray discs, etc.), magnetic disks A storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory may exist independently, and is connected to the processor through a communication line 404. The memory can also be integrated with the processor.
其中,存储器401用于存储执行本申请方案的计算机执行指令,并由处理器402来控制执行。处理器402用于执行存储器401中存储的计算机执行指令,从而实现本申请上述实施例提供的方法。The memory 401 is used to store computer-executed instructions for executing the solution of the present application, and the processor 402 controls the execution. The processor 402 is configured to execute computer-executable instructions stored in the memory 401, so as to implement the method provided in the foregoing embodiment of the present application.
可选的,本申请实施例中的计算机执行指令也可以称之为应用程序代码,本申请实施例对此不作具体限定。Optionally, the computer-executable instructions in the embodiments of the present application may also be referred to as application program code, which is not specifically limited in the embodiments of the present application.
基于同一发明构思,本申请实施例还提供了一种计算机可读非易失性存储介质,包括计算机可读指令,当计算机读取并执行计算机可读指令时,使 得计算机执行上述诊断Spark应用的方法。Based on the same inventive concept, the embodiments of the present application also provide a computer-readable non-volatile storage medium, including computer-readable instructions. When the computer reads and executes the computer-readable instructions, the computer executes the above diagnostic Spark application. method.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。This application is described with reference to flowcharts and/or block diagrams of methods, equipment (systems), and computer program products according to the embodiments of this application. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are generated It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.
尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。Although the preferred embodiments of the present application have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present invention.
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. In this way, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention is also intended to include these modifications and variations.

Claims (12)

  1. 一种诊断Spark应用的方法,其特征在于,包括:A method for diagnosing Spark applications, characterized in that it includes:
    获取Spark应用的上下文信息;Get the context information of the Spark application;
    根据所述上下文信息确定所述Spark应用的诊断指标和所述诊断指标对应的指标规则;Determine the diagnostic index of the Spark application and the index rule corresponding to the diagnostic index according to the context information;
    根据所述Spark应用的诊断指标,采集所述Spark应用在运行过程中所述诊断指标对应的运行信息;According to the diagnostic index of the Spark application, collecting operation information corresponding to the diagnostic index during the operation of the Spark application;
    根据所述诊断指标对应的指标规则对所述诊断指标对应的运行信息进行诊断,确定所述Spark应用的诊断结果。Diagnose the operation information corresponding to the diagnostic index according to the index rule corresponding to the diagnostic index, and determine the diagnosis result of the Spark application.
  2. 如权利要求1所述的方法,其特征在于,所述诊断指标为多个;The method of claim 1, wherein there are multiple diagnostic indicators;
    所述根据所述诊断指标对应的指标规则对所述诊断指标对应的运行信息进行诊断,确定所述Spark应用的诊断结果,包括:The diagnosing the operation information corresponding to the diagnostic index according to the index rule corresponding to the diagnostic index, and determining the diagnosis result of the Spark application, includes:
    针对任一个诊断指标,根据所述诊断指标对应的指标规则对所述诊断指标对应的运行信息进行诊断,确定所述诊断指标对应的诊断结果;For any diagnostic index, diagnose the operating information corresponding to the diagnostic index according to the index rule corresponding to the diagnostic index, and determine the diagnostic result corresponding to the diagnostic index;
    从多个所述诊断指标对应的诊断结果中确定出符合预设规则的诊断结果,并确定为所述Spark应用的诊断结果。The diagnosis result that meets the preset rule is determined from the diagnosis results corresponding to the multiple diagnosis indicators, and it is determined as the diagnosis result of the Spark application.
  3. 如权利要求1或2所述的方法,其特征在于,在所述确定所述Spark应用的诊断结果之后,还包括:The method according to claim 1 or 2, wherein after the determining the diagnosis result of the Spark application, the method further comprises:
    根据所述符合预设规则的诊断结果中的诊断码,从预设数据库中获取所述符合预设规则的诊断结果中的诊断码对应的诊断措施并报告给用户;所述预设数据库中预先设置有诊断码与诊断措施的对应关系。According to the diagnostic code in the diagnostic result that meets the preset rules, the diagnostic measures corresponding to the diagnostic code in the diagnostic result that meets the preset rules are obtained from a preset database and reported to the user; the preset database is in advance Correspondence between diagnostic code and diagnostic measures is set.
  4. 如权利要求1-3任一项所述的方法,其特征在于,所述根据所述诊断指标对应的指标规则对所述诊断指标对应的运行信息进行诊断,包括:The method according to any one of claims 1 to 3, wherein the diagnosing the operation information corresponding to the diagnostic index according to the index rule corresponding to the diagnostic index comprises:
    将所述诊断指标对应的运行信息进行统一化处理后,生成所述诊断指标对应的运行指标;After unifying the operation information corresponding to the diagnostic index, the operation index corresponding to the diagnostic index is generated;
    根据所述诊断指标对应的指标规则对所述诊断指标对应的运行指标进行 诊断。Diagnose the operating index corresponding to the diagnostic index according to the index rule corresponding to the diagnostic index.
  5. 如权利要求1至4任一项所述的方法,其特征在于,在所述根据所述上下文信息确定所述Spark应用的诊断指标和所述诊断指标对应的指标规则之前,还包括:The method according to any one of claims 1 to 4, wherein before the determining the diagnostic index of the Spark application and the index rule corresponding to the diagnostic index according to the context information, the method further comprises:
    获取用户配置信息;Obtain user configuration information;
    所述根据所述上下文信息确定所述Spark应用的诊断指标和所述诊断指标对应的指标规则,包括:The determining the diagnostic index of the Spark application and the index rule corresponding to the diagnostic index according to the context information includes:
    根据所述用户配置信息、所述上下文信息,确定所述Spark应用的诊断指标和所述诊断指标对应的指标规则。Determine the diagnostic index of the Spark application and the index rule corresponding to the diagnostic index according to the user configuration information and the context information.
  6. 一种诊断Spark应用的装置,其特征在于,包括:A device for diagnosing Spark applications, characterized in that it includes:
    获取单元,用于获取Spark应用的上下文信息;The obtaining unit is used to obtain the context information of the Spark application;
    处理单元,用于根据所述上下文信息确定所述Spark应用的诊断指标和所述诊断指标对应的指标规则;根据所述Spark应用的诊断指标,通过所述获取单元采集所述Spark应用在运行过程中所述诊断指标对应的运行信息;根据所述诊断指标对应的指标规则对所述诊断指标对应的运行信息进行诊断,确定所述Spark应用的诊断结果。The processing unit is configured to determine the diagnostic index of the Spark application and the index rule corresponding to the diagnostic index according to the context information; according to the diagnostic index of the Spark application, the acquisition unit collects the running process of the Spark application The operating information corresponding to the diagnostic indicator in the operating information; the operating information corresponding to the diagnostic indicator is diagnosed according to the indicator rule corresponding to the diagnostic indicator, and the diagnosis result of the Spark application is determined.
  7. 如权利要求6所述的装置,其特征在于,所述诊断指标为多个;所述处理单元具体用于:针对任一个诊断指标,根据所述诊断指标对应的指标规则对所述诊断指标对应的运行信息进行诊断,确定所述诊断指标对应的诊断结果;从多个所述诊断指标对应的诊断结果中确定出符合预设规则的诊断结果,并确定为所述Spark应用的诊断结果。The device according to claim 6, wherein the diagnosis index is multiple; the processing unit is specifically configured to: for any one diagnosis index, correspond to the diagnosis index according to an index rule corresponding to the diagnosis index The diagnosis result corresponding to the diagnosis index is determined; the diagnosis result conforming to the preset rule is determined from the diagnosis results corresponding to the plurality of diagnosis indexes, and it is determined as the diagnosis result of the Spark application.
  8. 如权利要求6-7任一项所述的装置,其特征在于,所述处理单元还用于:在所述确定所述Spark应用的诊断结果之后,根据所述符合预设规则的诊断结果中的诊断码,通过所述获取单元从预设数据库中获取所述符合预设规则的诊断结果中的诊断码对应的诊断措施并报告给用户;其中,所述预设数据库中预先设置有诊断码与诊断措施的对应关系。The device according to any one of claims 6-7, wherein the processing unit is further configured to: after the determination of the diagnosis result of the Spark application, according to the diagnosis result that meets the preset rule The diagnostic code corresponding to the diagnostic code in the diagnostic result that conforms to the preset rules is obtained from the preset database through the obtaining unit and reported to the user; wherein the preset database is preset with the diagnostic code Correspondence with diagnostic measures.
  9. 如权利要求6-8任一项所述的装置,其特征在于,所述处理单元具体 用于:将所述诊断指标对应的运行信息进行统一化处理后,生成所述诊断指标对应的运行指标;根据所述诊断指标对应的指标规则对所述诊断指标对应的运行指标进行诊断。8. The device according to any one of claims 6-8, wherein the processing unit is specifically configured to: after unifying the operation information corresponding to the diagnostic index, generate the operation index corresponding to the diagnostic index ; Diagnose the operating index corresponding to the diagnostic index according to the index rule corresponding to the diagnostic index.
  10. 如权利要求6至9任一项所述的装置,其特征在于,所述处理单元还用于:在所述根据所述上下文信息确定所述Spark应用的诊断指标和所述诊断指标对应的指标规则之前,通过所述获取单元获取用户配置信息;根据所述用户配置信息、所述上下文信息,确定所述Spark应用的诊断指标和所述诊断指标对应的指标规则。The device according to any one of claims 6 to 9, wherein the processing unit is further configured to: determine the diagnostic index of the Spark application and the index corresponding to the diagnostic index according to the context information Before the rule, user configuration information is obtained through the obtaining unit; according to the user configuration information and the context information, the diagnosis index of the Spark application and the index rule corresponding to the diagnosis index are determined.
  11. 一种计算设备,其特征在于,包括:A computing device, characterized by comprising:
    存储器,用于存储程序指令;Memory, used to store program instructions;
    处理器,用于调用所述存储器中存储的程序指令,按照获得的程序执行权利要求1至5任一项所述的方法。The processor is configured to call the program instructions stored in the memory, and execute the method according to any one of claims 1 to 5 according to the obtained program.
  12. 一种计算机可读非易失性存储介质,其特征在于,包括计算机可读指令,当计算机读取并执行所述计算机可读指令时,使得计算机执行如权利要求1至5任一项所述的方法。A computer-readable non-volatile storage medium, characterized by comprising computer-readable instructions, when the computer reads and executes the computer-readable instructions, the computer is caused to execute any one of claims 1 to 5 Methods.
PCT/CN2020/083381 2019-05-23 2020-04-03 Method and apparatus for diagnosing spark application WO2020233252A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910432603.1A CN110175124A (en) 2019-05-23 2019-05-23 A kind of method and device of diagnosis Spark application
CN201910432603.1 2019-05-23

Publications (1)

Publication Number Publication Date
WO2020233252A1 true WO2020233252A1 (en) 2020-11-26

Family

ID=67691926

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/083381 WO2020233252A1 (en) 2019-05-23 2020-04-03 Method and apparatus for diagnosing spark application

Country Status (2)

Country Link
CN (1) CN110175124A (en)
WO (1) WO2020233252A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175124A (en) * 2019-05-23 2019-08-27 深圳前海微众银行股份有限公司 A kind of method and device of diagnosis Spark application
CN113760671A (en) * 2020-10-19 2021-12-07 北京沃东天骏信息技术有限公司 Online task diagnosis method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090132860A1 (en) * 2007-11-21 2009-05-21 Inventec Corporation System and method for rapidly diagnosing bugs of system software
CN103412805A (en) * 2013-07-31 2013-11-27 交通银行股份有限公司 IT (information technology) fault source diagnosis method and IT fault source diagnosis system
CN107992406A (en) * 2017-11-09 2018-05-04 北京东土科技股份有限公司 A kind of method for testing software, related system and computer-readable recording medium
CN110175124A (en) * 2019-05-23 2019-08-27 深圳前海微众银行股份有限公司 A kind of method and device of diagnosis Spark application

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106557351B (en) * 2016-11-21 2019-08-09 广东高标电子科技有限公司 The data processing method and device of built-in application program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090132860A1 (en) * 2007-11-21 2009-05-21 Inventec Corporation System and method for rapidly diagnosing bugs of system software
CN103412805A (en) * 2013-07-31 2013-11-27 交通银行股份有限公司 IT (information technology) fault source diagnosis method and IT fault source diagnosis system
CN107992406A (en) * 2017-11-09 2018-05-04 北京东土科技股份有限公司 A kind of method for testing software, related system and computer-readable recording medium
CN110175124A (en) * 2019-05-23 2019-08-27 深圳前海微众银行股份有限公司 A kind of method and device of diagnosis Spark application

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MA, ZHIHENG: "Design and Implementation of Log Analysis Tools Based on Spark", MASTER THESIS, 1 May 2017 (2017-05-01), pages 1 - 83, XP009524447 *

Also Published As

Publication number Publication date
CN110175124A (en) 2019-08-27

Similar Documents

Publication Publication Date Title
JP5978401B2 (en) Method and system for monitoring the execution of user requests in a distributed system
WO2019104854A1 (en) Performance test and evaluation method and apparatus, terminal device, and storage medium
US8141053B2 (en) Call stack sampling using a virtual machine
US9934261B2 (en) Progress analyzer for database queries
US10116534B2 (en) Systems and methods for WebSphere MQ performance metrics analysis
US20130080502A1 (en) User interface responsiveness monitor
US20130081001A1 (en) Immediate delay tracker tool
WO2020233252A1 (en) Method and apparatus for diagnosing spark application
CN111563014A (en) Interface service performance test method, device, equipment and storage medium
JP2012503826A (en) Evaluating the effectiveness of memory management techniques that use selective mitigation to reduce errors
US8631280B2 (en) Method of measuring and diagnosing misbehaviors of software components and resources
EP4182796B1 (en) Machine learning-based techniques for providing focus to problematic compute resources represented via a dependency graph
CN110647447B (en) Abnormal instance detection method, device, equipment and medium for distributed system
CN105302714A (en) Method and apparatus for monitoring memory leak in test process
US9600523B2 (en) Efficient data collection mechanism in middleware runtime environment
US8725461B2 (en) Inferring effects of configuration on performance
JP2016100006A (en) Method and device for generating benchmark application for performance test
US20160077832A1 (en) Agile estimation
CN110557291A (en) Network service monitoring system
CN110377519B (en) Performance capacity test method, device and equipment of big data system and storage medium
CN109542341B (en) Read-write IO monitoring method, device, terminal and computer readable storage medium
US20230376397A1 (en) Method and System for Determining Interval Time for Testing of Server, and Device and Medium
CN111176831A (en) Dynamic thread mapping optimization method and device based on multithread shared memory communication
CN109992408B (en) Resource allocation method, device, electronic equipment and storage medium
US9081605B2 (en) Conflicting sub-process identification method, apparatus and computer program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20810500

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20810500

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 21/03/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20810500

Country of ref document: EP

Kind code of ref document: A1