CN113900938A - Fault processing method and device for big data processing task - Google Patents

Fault processing method and device for big data processing task Download PDF

Info

Publication number
CN113900938A
CN113900938A CN202111163507.5A CN202111163507A CN113900938A CN 113900938 A CN113900938 A CN 113900938A CN 202111163507 A CN202111163507 A CN 202111163507A CN 113900938 A CN113900938 A CN 113900938A
Authority
CN
China
Prior art keywords
fault
link
data processing
big data
processing task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111163507.5A
Other languages
Chinese (zh)
Inventor
张永泉
李楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111163507.5A priority Critical patent/CN113900938A/en
Publication of CN113900938A publication Critical patent/CN113900938A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The disclosure provides a fault processing method and device for a big data processing task, and relates to the field of data processing, in particular to the technical field of big data. The specific implementation scheme is as follows: when a big data processing task fails, determining a failed link in the big data processing task according to the dependency relationship among all links in the big data processing task; determining a target automation script corresponding to the fault link according to a corresponding relation between a preset link and an automation script, wherein each automation script is used for solving the fault of the corresponding link; and calling the target automation script to solve the fault of the fault link. The labor cost of operation and maintenance of the big data processing task can be reduced.

Description

Fault processing method and device for big data processing task
Technical Field
The present disclosure relates to the field of data processing technology, and in particular, to the field of big data technology.
Background
The whole process of the big data processing task comprises a plurality of links, and the big data processing task cannot normally run due to the fact that any link fails.
Disclosure of Invention
The present disclosure provides a method, apparatus, device, and storage medium for reducing labor costs for large data processing task operation and maintenance.
According to a first aspect of the present disclosure, a fault handling method for a big data processing task is provided, which includes:
when a big data processing task fails, determining a failed link in the big data processing task according to the dependency relationship among all links in the big data processing task;
determining a target automation script corresponding to the fault link according to a corresponding relation between a preset link and an automation script, wherein each automation script is used for solving the fault of the corresponding link;
and calling the target automation script to solve the fault of the fault link.
According to a second aspect of the present disclosure, there is provided a fault handling apparatus for a big data processing task, comprising:
the fault positioning module is used for determining a fault link which has a fault in the big data processing task according to the dependency relationship among all links in the big data processing task when the big data processing task has the fault;
the script selection module is used for determining a target automation script corresponding to the fault link according to the corresponding relation between the preset link and the automation script, wherein each automation script is used for solving the fault of the corresponding link;
and the script calling module is used for calling the target automation script to solve the fault of the fault link.
According to a third aspect of the present disclosure, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the first aspects.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to any of the above first aspects.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of any of the first aspects described above.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic flow chart diagram of a method for fault handling of big data processing tasks provided in accordance with the present disclosure;
FIG. 2 is another flow diagram of a method of fault handling for a big data processing task provided in accordance with the present disclosure;
FIG. 3 is a schematic diagram of a structure of a decision tree used in a fault handling method for a big data processing task according to the present disclosure
FIG. 4 is another flow diagram of a method of fault handling for a big data processing task provided in accordance with the present disclosure;
FIG. 5 is a schematic diagram of one configuration of a fault handling device for large data processing tasks provided in accordance with the present disclosure;
FIG. 6 is a block diagram of an electronic device for implementing a fault handling method for big data processing tasks of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In order to more clearly describe the fault handling method for the big data processing task provided by the present disclosure, an exemplary description will be given below of a possible application scenario of the big data processing task provided by the present disclosure. It is to be understood that the following example is only one possible application scenario of the big data processing task provided by the present disclosure, and in other possible embodiments, the big data processing task provided by the embodiment of the present invention may also be applied in other possible application scenarios, and the following example does not set any limitation thereto.
The big data processing task comprises many links, such as data acquisition, data distribution, data calculation, data application and the like. And different links may be Distributed to different computing platforms for execution, such as Spark-based platform, Kafka platform, HDFS (Distributed File System) platform, and Doris-based platform
A failure in any link may cause the entire big data processing task to be unable to be executed normally (i.e., to fail).
In the related art, when a big data processing task fails, related personnel can manually troubleshoot the problem that the big data processing task fails, and the problem is solved. On one hand, the labor cost is high, on the other hand, a long time is usually needed from the time when a fault occurs to the time when a relevant person is notified to perform treatment, and then the time is poor when the relevant person completes the treatment.
Based on the above, the present disclosure provides a fault handling method for a big data processing task, and the fault handling for the big data processing task provided by the present disclosure can be applied to any electronic device with the fault handling capability of the big data processing task. The fault processing method for the big data processing task provided by the present disclosure may be as shown in fig. 1, and includes:
s101, when the big data processing task fails, determining a failed link in the big data processing task according to the dependency relationship among all links in the big data processing task.
And S102, determining a target automation script corresponding to a fault link according to a corresponding relation between a preset link and an automation script, wherein each automation script is used for solving the fault of the corresponding link.
And S103, calling an automatic script to solve the fault of the fault link.
According to the embodiment, the fault link with the fault can be positioned through the dependence relationship among all links in the big data processing task, and the target automation script which can be used for solving the fault of the fault link is screened out from the preset automation scripts used for solving different faults according to the fault link with the fault, so that the fault of the fault link is automatically solved through the target automation script. The problem of faults caused by manual troubleshooting is not needed and solved, and the labor cost of operation and maintenance of the big data processing task is effectively reduced.
Moreover, the fault can be automatically solved at the first time after the fault occurs in the big data processing task, so that the timeliness for solving the fault is higher, the problem that the big data processing task cannot be normally executed for a long time due to the fact that the fault cannot be solved in time is avoided, and the running stability of the big data processing task is effectively improved.
The following will explain each step in the foregoing S101 to S103 in detail:
in S101, the big data processing task may be any type of big data processing task, and links included in the big data processing task may also be different according to different application scenarios, for example, in one possible embodiment, the big data processing task includes links of data acquisition, data distribution, data calculation, and intelligent analysis. In another possible embodiment, the big data processing task includes data acquisition, data calculation, and data application.
It will be appreciated that different links in a big data processing task may be handled by different computing platforms, i.e. different links are distributed across different computing platforms. The dependency relationships among all links in the big data processing task include dependency relationships distributed among all links of the same computing platform and dependency relationships distributed among all links of different computing platforms.
And it can be understood that if a link is not failed and data input to the link is normal data, data output from the link is theoretically normal data. If a link fails or data input to the link is abnormal data, the data output by the link is theoretically abnormal data. Therefore, if data input to a link is normal data and data output from the link is abnormal data, it can be determined that the link has a failure.
And the dependency relationship among all links can reflect the data interaction relationship among all links, so that whether the data input to all links is abnormal data or not and whether the data output by all links is abnormal data or not can be determined according to the dependency relationship, and the failed link with a fault can be determined.
In S102, the automation script is preset, and different automation scripts are used to solve different faults, and the automation scripts may be added or deleted according to actual needs. For example, the user finds that the fault a does not occur in the big data processing task according to actual experience, and the preset automation script includes the automation script for solving the fault a, so that the automation script for solving the fault a can be deleted in view of saving storage resources. For another example, the user adds an automation script for solving the fault B in the execution subject according to actual needs.
In one possible embodiment, the automation script includes any one or more of the following scripts: the script is used for reacquiring upstream data after preset waiting time, calling and deleting logic to clear expired data, redriving the Spark executer after adjusting the memory parameter of the Spark executer, restarting the server, triggering and deleting the oldest data of the Kafka platform and restarting the Kafka platform, and setting filtering conditions and filtering dirty data according to the set filtering conditions. Any script can be written by any scripting language or high-level language, and the present disclosure does not limit the present disclosure in any way.
The correspondence between links and automation scripts can be represented in any form, including but not limited to: tables, linked lists, decision trees, etc. For convenience of description, the following description is only exemplary given by taking the case where the corresponding relationship is represented in the form of a decision tree, and the principle is the same for the case where the corresponding relationship is represented in other forms, and therefore, the description is omitted here.
Each link may correspond to at least one automation script, and different links may correspond to different automation scripts, as well as to the same automation script. For example, assuming that a first link and a second link in a big data processing task may both have a failure a, and a first automation script is used to solve the failure a, the first link and the second link in the correspondence may both correspond to the first automation script.
And one link can correspond to a plurality of automation scripts, and all the automation scripts corresponding to the target link are determined as the target automation scripts under the condition that the target link corresponds to the plurality of automation scripts.
In S103, if the target automation script is an automation script, the automation script is called to solve the fault occurring in the faulty link. And if the target automation scripts are a plurality of automation scripts, sequentially calling each target automation script until all the target automation scripts are called or the fault occurring in the fault link is successfully solved.
It will be appreciated that invoking the automation script does not necessarily succeed in resolving the fault occurring at the failed link. Therefore, in a possible embodiment, after the automatic script is called to solve the fault occurring in the fault link, if the fault of the fault link is solved, an alarm message is sent to the preset terminal device to remind related personnel that the fault link is failed and is solved.
By adopting the embodiment, the alarm can be given under the condition that the fault cannot be automatically solved, so that related personnel can timely intervene and solve the fault, and the running stability of the big data processing task is further improved.
And in another possible embodiment, as shown in fig. 2, includes:
s201, when the big data processing task fails, determining a failed link in the big data processing task according to the dependency relationship among all links in the big data processing task.
The step is the same as the foregoing step S101, and reference may be made to the foregoing description related to S101, which is not described herein again.
S202, according to the corresponding relation between the preset links and the automation scripts, determining the target automation scripts corresponding to the failed links, wherein each automation script is used for solving the problem of the corresponding link.
This step is the same as S102, and reference may be made to the related description about S102, which is not repeated herein.
And S203, calling an automatic script to solve the fault of the fault link.
This step is the same as S103, and reference may be made to the related description about S103, which is not described herein again.
And S204, displaying a processing result for indicating whether the fault of the fault link is solved or not.
The processing result may be displayed by a component having a display capability in the execution main body, such as a display screen, a sound box, or the like, or the processing result may be displayed by a device having a communication connection with the execution main body and a display capability, such as a display screen independent of the execution main body, a user terminal device with a display screen, or the like.
The processing result may be "success" if the failure of the failed link has been resolved, and may be "failure" if the failure of the failed link has not been resolved. According to the processing result, related personnel can determine whether the target automation script has the capacity of solving the fault occurring in the fault link.
If the processing result indicates that the fault of the fault link is solved, the target automation script can be considered to have the capability of solving the fault of the fault link, and if the processing result indicates that the fault of the fault link is not solved, the target automation script can be considered not to have the capability of solving the fault of the fault link.
S205, an adjustment instruction input for the processing result is acquired.
The adjustment instruction is used for indicating the execution main body to adjust the corresponding relation. The adjustment instruction can be input by the user according to actual needs or experience, and can also be input by the user according to the displayed processing result.
For example, a first link in the correspondence corresponds to a first automation script, and although the first automation script has the capability of solving the fault occurring in the first link, in order to more efficiently solve the fault occurring in the first link, the user upgrades the first automation script to obtain a second automation script, and then the user inputs an adjustment instruction to instruct an execution subject to adjust the correspondence, so that the first link corresponds to the second automation script.
For another example, the first link in the correspondence corresponds to the first automation script, and the user knows, according to the displayed processing result, that the first automation script does not have the capability of solving the fault occurring in the first link, and therefore, the third automation script is newly developed. The user inputs an adjustment instruction to instruct the execution subject to adjust the correspondence so that the first link corresponds to the third automation script.
And S206, adjusting the corresponding relation according to the adjusting instruction.
The adjustment instruction may instruct the execution main body to adjust the automation script corresponding to the failed link, or may instruct the execution main body to adjust the automation script corresponding to another link other than the failed link, which is not limited in this disclosure.
By adopting the embodiment, the processing result can be displayed to provide reference for the user to adjust the corresponding relation, so that the user can optimize the corresponding relation in time, the automatic script corresponding to each link can better solve the fault of each link, and the probability of successfully solving the fault is improved.
The following will exemplarily explain the case of representing the corresponding relationship by the form of the decision tree, and refer to fig. 3, where fig. 3 is a schematic structural diagram of the decision tree provided by the present disclosure. In this embodiment, the big data processing task may send an alarm mail to the execution subject during execution, where the alarm mail is used to indicate the execution status of the big data processing task.
And the execution main body analyzes the alarm mail and determines the execution state of the big data processing task. If the execution state is: and if the fault is successful, the execution main body does not give an alarm and does not need to process the fault. If the execution state is: and if the operation is overtime or fails, determining that the big data processing task fails. And determining a fault link which has a fault in the big data processing task according to the dependency relationship of all links in the big data processing task. According to the fault link, inquiring child nodes containing nodes used for expressing the fault link in a preset decision tree; and determining the automation script represented by the child node as the automation script corresponding to the fault link.
In the correspondence represented by the decision tree shown in fig. 3, the link: the Kafka queue corresponds to a script that triggers deletion of the oldest data of the Kafka platform and restart of the Kafka platform, link: the Spark calculation engine corresponds to a script for redriving the Spark execution after adjusting the memory parameter of the Spark execution and a script for throwing out a parent task and performing recursive retry, and the link is as follows: the HDFS stores scripts corresponding to the cleaning of stale data for invoking delete logic.
After calling the script for triggering deletion of the oldest data of the Kafka platform and restarting of the Kafka platform, if the link is successfully solved: and (3) finishing fault processing if the Kafka queue has faults, and if the link is not successfully solved: and (4) alarming when the Kafka queue fails.
After a script for redriving the Spark executor after the memory parameter of the Spark executor is adjusted is called, if the link is successfully solved: and if the Spark calculation engine has a fault, ending the fault processing, and if the link is not successfully solved: and if the Spark computing engine fails, alarming.
After calling the script for throwing out the parent task and recursively retrying, if the link is successfully solved: and if the Spark calculation engine has a fault, ending the fault processing, and if the link is not successfully solved: and if the Spark computing engine fails, alarming.
After the script for calling the delete logic to clear the expired data is called, if the link is successfully solved: and (3) storing the generated fault by the HDFS, finishing fault processing, and if the link is not successfully solved: and the HDFS stores the fault, and then an alarm is given.
Assuming a fault link as a link: kafka queue, then the decision tree can be queried to determine which links to represent: and the automation script represented by the child node is a script for triggering deletion of the oldest data of the Kafka platform and restarting of the Kafka platform, so that the target automation script is a script for triggering deletion of the oldest data of the Kafka platform and restarting of the Kafka platform.
By adopting the embodiment, the corresponding relation between the links and the automation scripts can be represented through the decision tree, so that the execution main body can quickly determine the target automation scripts corresponding to the fault environment through inquiring the decision tree, and further the fault solving efficiency is improved.
Referring to fig. 4, fig. 4 is a schematic flow chart illustrating a fault handling method for a big data processing task according to the present disclosure, which may include:
s401, determining a data blood relationship among data generated by all links in the big data processing task according to the dependency relationship among all links in the big data processing task.
And aiming at all platforms participating in the big data processing task, constructing data blood relationship of data among all platforms and data generated by all links according to the upstream and downstream dependency relationship among the links executed by each platform.
S402, data backtracking is carried out according to the data blood relationship, and a fault link with a fault in the big data processing task is determined.
It can be understood that, as described in the foregoing S101, since the data blooding relationships can reflect the interactive flows of the data among the links, the data can be traced back according to the blooding relationships, so as to locate the failed link that has failed.
And S403, determining a target automation script corresponding to the failed link according to the corresponding relationship between the preset link and the automation scripts, wherein each automation script is used for solving the failure of the corresponding link.
The step is the same as the step S102, and reference may be made to the related description about S102, which is not repeated herein.
And S404, calling an automation script to solve the fault of the fault link.
This step is the same as S103, and reference may be made to the related description about S103, which is not described herein again.
By adopting the embodiment, the data blood relationship can be established, so that the failed link in the big data processing task can be accurately positioned in the cross-platform data backtracking process, and the reliability of the scheme is further improved.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a fault handling apparatus for a big data processing task according to the present disclosure, which may include:
the fault locating module 501 is configured to, when a big data processing task fails, determine a failed link in the big data processing task according to a dependency relationship among all links in the big data processing task;
a script selection module 502, configured to determine, according to a correspondence between a preset link and an automation script, a target automation script corresponding to the failed link, where each automation script is used to solve a failure occurring in the corresponding link;
and a script calling module 503, configured to call the target automation script to solve the fault occurring in the faulty link.
In a possible embodiment, the script selecting module 502 is specifically configured to query, according to the failed link, a child node in a preset decision tree, where the child node includes a node used for representing the failed link;
and determining the automation script represented by the child node as the automation script corresponding to the fault link.
In a possible embodiment, further comprising:
and the alarm module is used for sending an alarm message to preset terminal equipment if the fault of the fault link is not solved after the automatic script is called to solve the fault of the fault link.
In a possible embodiment, further comprising:
the optimization module is used for displaying a processing result for indicating whether the fault of the fault link is solved or not after the automatic script is called to solve the fault of the fault link;
acquiring an adjusting instruction input aiming at the processing result;
and adjusting the corresponding relation according to the adjusting instruction.
In a possible embodiment, the fault location module 501 is specifically configured to determine a data blood relation between data generated by each link in the big data processing task according to a dependency relationship between all links in the big data processing task;
and performing data backtracking according to the data blood relationship, and determining a link with a fault in the big data processing task.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as a failure processing method of a large data processing task. For example, in some embodiments, the fault handling method of a big data processing task may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the fault handling method of the big data processing task described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured by any other suitable means (e.g., by means of firmware) to perform the fault handling method of the big data processing task.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (13)

1. A fault processing method for a big data processing task comprises the following steps:
when a big data processing task fails, determining a failed link in the big data processing task according to the dependency relationship among all links in the big data processing task;
determining a target automation script corresponding to the fault link according to a corresponding relation between a preset link and an automation script, wherein each automation script is used for solving the fault of the corresponding link;
and calling the target automation script to solve the fault of the fault link.
2. The method of claim 1, wherein the corresponding relationship between the preset links and the automation script comprises:
according to the fault link, inquiring child nodes containing nodes used for expressing the fault link in a preset decision tree;
and determining the automation script represented by the child node as the automation script corresponding to the fault link.
3. The method of any of claims 1-2, further comprising:
after the automatic script is called to solve the fault of the fault link, if the fault of the fault link is not solved, an alarm message is sent to a preset terminal device.
4. The method of any of claims 1-3, further comprising:
after the automatic script is called to solve the fault of the fault link, displaying a processing result for indicating whether the fault of the fault link is solved or not;
acquiring an adjusting instruction input aiming at the processing result;
and adjusting the corresponding relation according to the adjusting instruction.
5. The method of claim 1, wherein determining a failed link in the big data processing task according to dependencies between all links in the big data processing task comprises:
determining a data blood relationship among data generated by all links in the big data processing task according to the dependency relationship among all links in the big data processing task;
and performing data backtracking according to the data blood relationship, and determining a link with a fault in the big data processing task.
6. A fault handling apparatus for big data processing tasks, comprising:
the fault positioning module is used for determining a fault link which has a fault in the big data processing task according to the dependency relationship among all links in the big data processing task when the big data processing task has the fault;
the script selection module is used for determining a target automation script corresponding to the fault link according to the corresponding relation between the preset link and the automation script, wherein each automation script is used for solving the fault of the corresponding link;
and the script calling module is used for calling the target automation script to solve the fault of the fault link.
7. The device according to claim 6, wherein the script selection module is specifically configured to query, according to the failed link, child nodes including a node representing the failed link in a preset decision tree;
and determining the automation script represented by the child node as the automation script corresponding to the fault link.
8. The apparatus of any of claims 6-7, further comprising:
and the alarm module is used for sending an alarm message to preset terminal equipment if the fault of the fault link is not solved after the automatic script is called to solve the fault of the fault link.
9. The apparatus of any of claims 6-8, further comprising:
the optimization module is used for displaying a processing result for indicating whether the fault of the fault link is solved or not after the automatic script is called to solve the fault of the fault link;
acquiring an adjusting instruction input aiming at the processing result;
and adjusting the corresponding relation according to the adjusting instruction.
10. The device according to claim 6, wherein the fault location module is specifically configured to determine a data blood relationship between data generated by each link in the big data processing task according to a dependency relationship between all links in the big data processing task;
and performing data backtracking according to the data blood relationship, and determining a link with a fault in the big data processing task.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.
13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-5.
CN202111163507.5A 2021-09-30 2021-09-30 Fault processing method and device for big data processing task Pending CN113900938A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111163507.5A CN113900938A (en) 2021-09-30 2021-09-30 Fault processing method and device for big data processing task

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111163507.5A CN113900938A (en) 2021-09-30 2021-09-30 Fault processing method and device for big data processing task

Publications (1)

Publication Number Publication Date
CN113900938A true CN113900938A (en) 2022-01-07

Family

ID=79189980

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111163507.5A Pending CN113900938A (en) 2021-09-30 2021-09-30 Fault processing method and device for big data processing task

Country Status (1)

Country Link
CN (1) CN113900938A (en)

Similar Documents

Publication Publication Date Title
CN107016480B (en) Task scheduling method, device and system
CN108804215B (en) Task processing method and device and electronic equipment
CN111427676B (en) Robot flow automatic task processing method and device
CN104991821A (en) Monitor task batch processing method and apparatus
CN113220378A (en) Flow processing method and device, electronic equipment, storage medium and system
CN112035344A (en) Multi-scenario test method, device, equipment and computer readable storage medium
CN115840631A (en) RAFT-based high-availability distributed task scheduling method and equipment
CN112199355A (en) Data migration method and device, electronic equipment and storage medium
CN113592337A (en) Fault processing method and device, electronic equipment and storage medium
CN114489997A (en) Timing task scheduling method, device, equipment and medium
CN113656252B (en) Fault positioning method, device, electronic equipment and storage medium
CN113656407A (en) Data topology generation method and device, electronic equipment and storage medium
CN115016321A (en) Hardware-in-loop automatic testing method, device and system
CN113434323A (en) Task flow control method of data center station and related device
CN113900938A (en) Fault processing method and device for big data processing task
CN114567536B (en) Abnormal data processing method, device, electronic equipment and storage medium
CN113590287B (en) Task processing method, device, equipment, storage medium and scheduling system
CN114327819B (en) Task management method, device, equipment and storage medium
CN111694686B (en) Processing method and device for abnormal service, electronic equipment and storage medium
CN115509714A (en) Task processing method and device, electronic equipment and storage medium
CN115296979A (en) Fault processing method, device, equipment and storage medium
CN115357493A (en) Test method, test device, electronic equipment and storage medium
CN113656239A (en) Monitoring method and device for middleware and computer program product
CN114791900A (en) Operator-based Redis operation and maintenance method, device, system and storage medium
CN114706893A (en) Fault detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination