WO2018045756A1 - Fault localization platform, fault localization method and device - Google Patents

Fault localization platform, fault localization method and device Download PDF

Info

Publication number
WO2018045756A1
WO2018045756A1 PCT/CN2017/081072 CN2017081072W WO2018045756A1 WO 2018045756 A1 WO2018045756 A1 WO 2018045756A1 CN 2017081072 W CN2017081072 W CN 2017081072W WO 2018045756 A1 WO2018045756 A1 WO 2018045756A1
Authority
WO
WIPO (PCT)
Prior art keywords
service
log
api
processing
service system
Prior art date
Application number
PCT/CN2017/081072
Other languages
French (fr)
Chinese (zh)
Inventor
陈克云
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018045756A1 publication Critical patent/WO2018045756A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a fault location platform, a fault location method, and a device.
  • a platform In a cloud service environment, in order to provide multiple services, a platform usually has multiple systems in the platform, and multiple services are completed through interaction between multiple systems. Among them, a variety of services may include: file services, object services, and host backup services.
  • the cloud platform 100 includes: a cloud management system 210 , a data protection service system 220 , a virtualization system 230 , a production storage system 240 , a cloud backup management system 250 , and a backup storage system 260 .
  • the cloud platform 100 performs a host backup service process as follows: the cloud management system 210 sends a backup request to the data protection service system 220; after receiving the backup request, the data protection service system 220 sends a scheduling backup request to the virtualization system 230; the virtualization system 230, according to the received scheduled backup request, send a backup request to the cloud backup management system 250, and query the backup status every preset time; the cloud backup management system 250 sequentially performs a volume snapshot 251, a volume snapshot comparison 252 according to the execution backup request, Extract data 253, store data 254, and backup complete 255.
  • the volume snapshot comparison refers to comparing the current time data with the previous time data; the cloud backup management system 250 stores the result of the volume snapshot comparison and the extracted difference data into the production storage system 240; The time data is stored in the backup storage system 260.
  • the host backup fails, it is necessary to start from the uppermost cloud management system 210, and sequentially check whether the data protection service system 220, the virtualization system 230, the production storage system 240, the cloud backup management system 250, and the backup storage system 260 are faulty. Eventually locating a failed system results in less efficient positioning of the failed system.
  • the embodiments of the present invention provide a fault location platform, a fault location method, and a device.
  • the technical solution is as follows:
  • a fault location platform includes: an identity distribution system, a log system, a first service system, and a second service system;
  • the identifier distribution system is configured to allocate a service request identifier (ID) to the service request, where the service request is sent when the first service system executes a service; a service that is executed by the service system and the second service system; the first service system is configured to generate a process log of each service step corresponding to the service request ID, where the process log is used to record the service
  • the execution result of the step; the respective service steps include: a service step performed by the first service system, and a service step performed by the first service system by the first service system; the log system is used for Receiving the processing log corresponding to the service request ID; determining an abnormal service step according to the execution result in the processing log, and positioning the service system for performing the abnormal service step as a faulty service system.
  • the first service system and the first service system call the second service system to perform the service step corresponding to the service request ID
  • the first service system sends a corresponding processing log to the log system, and the log is generated.
  • the system determines the abnormal service step according to the execution result in the received processing log, and finally locates the faulty service system.
  • the first service system generates a processing log for each service step, so that the log system can perform the execution result according to the processing log.
  • the specific faulty service system is determined, and the prior art needs to check each service system in order from the top and bottom, and finally determine the faulty service system.
  • the positioning efficiency of the faulty service system is caused.
  • the lower problem is that the faulty service system is located through the processing log corresponding to the service request ID, and the effect on the positioning efficiency of the faulty service system is improved.
  • the first service system is configured to generate a first process corresponding to the internal service step when performing an internal service step corresponding to the service request ID a log, the first processing log is sent to the log system, where the first processing log is used to record an execution result of the step of executing the internal service by the first service system; the first service system is further used to: And generating, by the second service system, an external service step corresponding to the service request ID, generating a second processing log corresponding to the external service step, and sending the second processing log to the log system, where The second processing log is configured to record an execution result of the executed second service system to execute the external service step; the log system is configured to determine, according to the execution result in the first processing log, Whether the internal service step is the abnormal service step, and when the internal service step is the abnormal service step, the first service system is located as the fault a service system, determining, according to the execution result in the second processing log, whether the external service step is the abnormal service step, where the external service step
  • the first service system records the execution result of the execution of the internal service step as the first processing log; the execution result of the execution of the external service step is recorded as the second processing log; and the log system according to the first processing log
  • the execution result may determine whether the first service system is a faulty service system; according to the execution result of the second process log, it may be determined whether the second service system is a faulty service system; and the internal service step and the external service step are separately recorded, It is beneficial to improve the positioning efficiency of faulty business systems.
  • the first service system includes: a first processing module having a first application programming interface (API), The first API has a corresponding first API identifier; the second service system includes: a second processing module having a second API, the second API has a corresponding second API identifier; and the first service system And sending the first processing log to the log system; the first processing log includes: the service request ID, a first service system ID, the first API identifier, and a result code, and the result code The first processing module performs the execution result of the internal service step; the first service system is further configured to send the second processing log to the log system; the second processing log includes: Describe the service request ID, the first service system ID, the first API identifier, the second service system ID, the second API identifier, and a return code, where the return code refers to calling the second processing module The execution result of the external service step is performed; the log system is configured to: when the faulty service system is the first service
  • API application programming interface
  • the log system determines, according to the first API identifier carried in the first processing log, that the API corresponding to the first API identifier is a fault API;
  • the log system determines, according to the second API identifier carried in the second processing log, that the API corresponding to the second API is a fault API; and the first processing identifier carries the first API identifier and The second processing log carries the second API identifier, so that the log system can locate the fault API according to the API identifier, thereby improving the accuracy of the positioning of the faulty service system.
  • the log system is configured to obtain a business process model corresponding to the service request ID, where the business process model includes: An execution sequence of each service step corresponding to the service request ID; and sequentially acquiring n first processing logs and m second processing logs corresponding to the respective service steps according to the execution order, where the n and the m are respectively Is a positive integer.
  • the log system obtains the first processing log and the second processing log corresponding to each service step according to the execution order in the business process model, which is beneficial to sequentially determining abnormal business steps according to the sequence of executing the business steps. It is beneficial to avoid waste of resources and improve the efficiency of positioning the faulty business system.
  • the log system determines whether the internal service step is an abnormal service step according to the execution result in the first processing log according to the execution order in the business process model, which is beneficial to sequentially determining according to the sequence of executing the business steps.
  • Abnormal business steps are beneficial to avoid waste of resources and improve the efficiency of positioning the faulty business system.
  • the log system determines whether the external service step is an abnormal service step according to the execution result in the second processing log according to the execution order in the business process model, which is beneficial to sequentially determining according to the sequence of executing the business steps.
  • Abnormal business steps are beneficial to avoid waste of resources and improve the efficiency of positioning the faulty business system.
  • a second aspect provides a fault locating method, the method comprising: receiving a processing log corresponding to a service request identifier ID; the service request is sent when the first service system executes a service, and the service is performed by a presence call relationship And the processing log is used to record the execution result of each service step corresponding to the service request ID, where the respective service steps include: the first service a service step performed by the system, and the first service system invokes a service step performed by the second service system; determining an abnormal service step according to the execution result in the processing log; and is configured to execute the abnormal service
  • the business system of the step is located as a faulty business system.
  • the log system determines the abnormal service step according to the received execution result in the processing log corresponding to the service request ID, and finally locates the faulty service system;
  • the processing step generates a processing log, so that the log system can determine a specific faulty service system according to the execution result in the processing log, and solves the problem in the prior art that the service systems are sequentially checked from the top to the bottom, and the faulty service is finally determined.
  • the system has a low efficiency in locating the faulty service system, and the faulty service system is located through the processing log corresponding to the service request ID, thereby improving the positioning efficiency of the faulty service system. effect.
  • the processing log includes: a first processing log and a second processing log; and determining, according to the execution result in the processing log, an abnormal service step, including: Determining, according to the execution result in the first processing log, whether the internal service step is the abnormal service step; the first processing log is used to record that the first service system executes the internal corresponding to the service request ID An execution result of the business step; determining, according to the execution result in the second processing log, whether the external service step is the abnormal service step; the second processing log is used to record the execution of the second service system and the The execution result of the external service step corresponding to the service request ID.
  • the determining, by using the service system for performing the abnormal service step, the faulty service system includes: When the internal service step is the abnormal service step, the first service system is located as the faulty service system; when the external service step is the abnormal service step, the second to be called The business system is located as the faulty business system.
  • the service system records the execution result of the execution of the internal service step as the first processing log; the execution result of the execution of the external service step is recorded as the second processing log; and the execution of the log processing system according to the first processing log
  • it can be determined whether the first service system is a faulty service system; according to the execution result of the second process log, it can be determined whether the second service system is a faulty service system; and the internal service step and the external service step are recorded separately, which is beneficial to improve Positioning efficiency for faulty business systems.
  • the first service system includes: a first processing module having a first application programming interface API, The first API has a corresponding first API identifier;
  • the second service system includes: a second processing module having a second API, the second API has a corresponding second API identifier;
  • the method further includes: When the faulty service system is the first service system, the API corresponding to the first API identifier is located as a fault API according to the first API identifier included in the first processing log;
  • the processing log includes: the service request ID, the first service system ID, the first API identifier, and the result code, where the result code refers to an execution result of the first processing module performing the internal service step;
  • the API corresponding to the second API identifier is located as the fault API according to the second API identifier included in the second processing log;
  • the management log includes: the service request ID, the first service system ID, the first API
  • the log system determines, according to the first API identifier carried in the first processing log, that the API corresponding to the first API identifier is a fault API;
  • the log system determines, according to the second API identifier carried in the second processing log, that the API corresponding to the second API is a fault API; and the first API identifier carries the first API identifier and the first The second processing log carries the second API identifier, so that the log system can locate the fault API according to the API identifier, thereby improving the accuracy of positioning the faulty service system.
  • the method may further include: acquiring a business process model corresponding to the service request ID, where the business process model includes: An execution sequence of each service step corresponding to the service request ID; and sequentially acquiring n first processing logs and m second processing logs corresponding to the respective service steps according to the execution order, where the n and the m are respectively Is a positive integer.
  • the log system obtains the first processing log and the second processing log corresponding to each service step according to the execution order in the business process model, which is beneficial to sequentially determining abnormal business steps according to the sequence of executing the business steps. It is beneficial to avoid waste of resources and improve the efficiency of positioning the faulty business system.
  • the determining, according to the execution result in the first processing log, whether the internal service step is the abnormal service step includes: Determining, according to the execution result in the ith first processing log, whether the internal service step is the abnormal service step, where i is a positive integer less than or equal to n; and the API corresponding to the first API identifier is located
  • the log system determines whether the internal service step is an abnormal service step according to the execution result in the first processing log according to the execution order in the business process model, which is beneficial to sequentially determining according to the sequence of executing the business steps.
  • Abnormal business steps are beneficial to avoid waste of resources and improve the efficiency of positioning the faulty business system.
  • the determining, according to the execution result in the second processing log, whether the external service step is the abnormal service step includes: Determining, according to the execution result in the jth second processing log, whether the external service step is the abnormal service step, where j is a positive integer less than or equal to m; and the API corresponding to the second API identifier is located
  • the log system determines whether the external service step is an abnormal service step according to the execution result in the second processing log according to the execution order in the business process model, which is beneficial to sequentially determining according to the sequence of executing the business steps.
  • Abnormal business steps are beneficial to avoid waste of resources and improve the efficiency of positioning the faulty business system.
  • a fault locating device comprising at least one unit for implementing a fault locating method that may be provided by any of the second aspect or the second aspect described above.
  • a computer readable storage medium having stored therein an executable program for implementing the fault location method provided by any of the possible aspects of the second aspect or the second aspect described above.
  • a logging system comprising a processor and a memory; the processor for storing one or more instructions, the instructions being indicated to be executed by the processor, the processor for A fault location method provided in any of the possible designs of the second aspect or the second aspect described above is implemented.
  • the service system When the service system performs the service step corresponding to the service request ID, the corresponding processing log is sent to the log system, and the log system determines the abnormal service step according to the execution result in the received processing log, and finally locates the faulty service system;
  • the log system determines the abnormal service step according to the execution result in the received processing log, and finally locates the faulty service system;
  • the number of service systems is large, the problem of low positioning efficiency of the faulty service system is achieved, and the fault is improved. The effect of the positioning efficiency of the business system.
  • FIG. 1 is a flowchart of a method for a host backup service provided in the prior art
  • FIG. 2 is a schematic structural diagram of a fault location platform according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a fault location platform according to another embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of fault location of a host backup service according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a log system according to an embodiment of the present invention.
  • FIG. 6 is a flowchart of a method for fault location method according to an embodiment of the present invention.
  • FIG. 7 is a flowchart of a method for fault location according to another embodiment of the present invention.
  • FIG. 8 is a flowchart of a method for fault location method according to still another embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a fault location system according to an embodiment of the present invention.
  • FIG. 10 is a flowchart of a method for fault location method according to an embodiment of the present invention.
  • FIG. 11 is a structural block diagram of a fault locating device according to an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of a fault location platform according to an embodiment of the present invention.
  • the platform may include an identity distribution system 120, a log system 140, a first service system 161, and a second service system 162.
  • the identity assignment system 120 has the ability to assign a service request ID to a service request.
  • the service request is sent by the first service system 161 when the service is executed, and the service is performed by the first service system 161 and the second service system 162 in which the call relationship exists.
  • the service system that performs the service including the first service system 161 and the second service system 162, is taken as an example, but the service system for performing the service is not specifically limited, for example, the service is performed.
  • the service system further includes: a third service system (not shown); wherein the service is the first service system 161 and the second service system 162 in which the call relationship exists, and the first service system 161 and the first presence call relationship
  • the identity distribution system 120 also has the ability to assign a service ID to the service.
  • One service ID corresponds to one service request ID, or one service ID corresponds to several service request IDs.
  • the identifier distribution system 120 when the service is triggered at different time points when the same service is executed, the identifier distribution system 120 generates different service request IDs for the service requests triggered at different time points. That is to say, each business step in the execution of the business generates a service request, and the identity distribution system 120 also assigns a service request ID.
  • the identifier distribution system 120 records a service ID, a service request ID, and a correspondence between the service ID and the service request ID.
  • the identity distribution system 120 synchronizes the recorded service ID, the service request ID, and the correspondence between the service ID and the service request ID to the log system 140.
  • the first service system 161 and the second service system 162 have the ability to perform services while the first service system 161 also has the ability to invoke the second service system to perform business steps.
  • Each service step corresponding to the service request ID includes: a service step performed by the first service system 161 and a service step performed by the first service system 162 by the first service system 162; when performing each service step corresponding to the service request ID, A service system 161 generates a processing log of each service step corresponding to the service request ID.
  • the service request ID corresponds to a service step, or the service request ID corresponds to a plurality of service steps; the service step corresponds to the service step
  • the presence of at least one service step is a step performed by the first service system 161 to invoke the second service system 162; each service step corresponds to a processing log.
  • the processing log is used to record the execution result of the business step; optionally, the execution result includes: the execution succeeds or the execution fails.
  • the first service system 161 sends the generated processing log corresponding to the service request ID to the log system 140.
  • the first service system 161 sends the processing log corresponding to the service request ID to the log system 140 by means of asynchronous transmission, or the first service system 161 reports the generated processing log corresponding to the service request ID to the log system 140. .
  • Logging system 140 has the ability to analyze processing logs.
  • the log system 140 receives the processing log corresponding to the service request ID sent by the first service system 161, determines an abnormal service step according to the execution result in the processing log, and locates the service system 140 that executes the abnormal service step as the faulty service system.
  • the abnormal service step includes an execution failure.
  • the log system 140 determines that the service system 140 that performs the service step is a faulty service system.
  • the fault location platform sends a corresponding service system to the log system when the first service system and the first service system call the second service system to perform the service step corresponding to the service request ID.
  • the processing log the log system determines the abnormal service step according to the execution result in the received processing log, and finally locates the faulty service system; since the first service system generates a processing log for each service step, the log system processes the log according to the processing log.
  • the execution result in the system can determine a specific faulty service system, and solves the problem in the prior art that the faulty service system is determined by sequentially checking each service system from above and below, and when the number of service systems is large, the fault is caused.
  • the problem of low positioning efficiency of the service system achieves the effect of locating the faulty service system through the processing log corresponding to the service request ID, thereby improving the positioning efficiency of the faulty service system.
  • the processing log reported by the first service system 161 includes: a first processing log and a second processing log, as shown in FIG. 3 .
  • the first service system 161 can independently execute the internal service step corresponding to the service request ID by itself, and generate the first process log corresponding to the internal service step when the internal service step is executed.
  • the first processing log records the execution results of the first business system 161 performing the internal business steps.
  • the first service system 161 includes a first processing module having a first API, where the first API has a corresponding first API identifier, and the first service system 161 performs an internal service corresponding to the service request ID by using the first processing module.
  • the first processing log includes: a service request ID, a first service system ID, a first API identifier, and a result code, where the result code is an execution result of the first processing module executing the internal service step.
  • the first processing log when the service step fails to be executed, the first processing log carries an incorrect result code, or, The processing log does not carry the result code, or the first processing log does not carry the result code, and carries the network connection abnormality or no response.
  • the first service system 161 sends the generated first processing log to the log system 140.
  • the second service system 162 is a service system that is invoked when the first service system 161 executes an external service step corresponding to the service request ID.
  • the first service system 161 generates a second processing log corresponding to the external service step when the second service system 162 is invoked to execute the external service step corresponding to the service request ID.
  • the second processing log records the execution result of the second business system 162 executing the external business step.
  • the second service system 162 includes a second processing module having a second API, where the second API has a corresponding second API identifier, and the first service system 161 invokes the second processing module to execute the service request by using the first processing module.
  • the external processing step corresponding to the ID; the second processing log includes: a service request ID, a first service system ID, a first API identifier, a second service system ID, and a second API identifier and a return code, where the return code is a second processing module Execute the execution result of the external business step.
  • the first service system 161 sends the generated second processing log to the log system 140.
  • the service request sent by the first service system 161 when the service is executed is taken as an example, but is not specifically limited.
  • the second service system 162 can also independently execute the internal service step corresponding to the service request ID, and generate a corresponding first processing log, and send it to the log system 140, or the second service.
  • the system calls the other service system to execute the external service step corresponding to the service request ID, and generates a corresponding second processing log, which is sent to the log system 140.
  • the log system 140 determines whether the internal service step is an abnormal service step according to the execution result in the first processing log.
  • the internal service step is an abnormal service step
  • the first service system 161 is positioned as a faulty service system
  • the log system 140 is further configured according to the The execution result in the second processing log determines whether the external service step is an abnormal service step.
  • the called second service system 162 is positioned as the faulty service system.
  • the log system 140 determines, according to the first API identifier carried in the first processing log, that the API corresponding to the first API identifier is a fault API; the log system is still After the second service system 162 is located as the faulty service system, the API corresponding to the second API identifier is determined to be a fault API according to the second API identifier carried in the second processing log.
  • the log system 140 also obtains a business process model corresponding to the service request ID.
  • the business process model includes: an execution sequence of each business step corresponding to the service request ID.
  • the log system 140 sequentially acquires n first processing logs and m second processing logs corresponding to the respective service steps according to the execution order in the business process model, where n and m are positive integers, respectively.
  • the log system 140 includes: an analysis component 141, a modeling component 142, an ID processing component 143, and a log component 144;
  • An ID processing component 143 configured to store a service request ID
  • a log component 144 configured to store a processing log corresponding to the service request ID
  • a modeling component 142 configured to store a business process model corresponding to the service request ID
  • the analysis component 141 is configured to determine an abnormal service step according to the execution result in the business process model and the processing log, and locate the business system for performing the abnormal business step as the faulty service system.
  • the service system for executing the service includes: the cloud management system 11, the data protection service system 12, the virtualization system 13, and the cloud.
  • the ID is fed back to the cloud management system 11, and is also synchronized to the ID processing component 143 in the log system 140.
  • the cloud management system 11 When the data protection service system 12 is invoked to execute the backup request, the cloud management system 11 generates a corresponding second processing log, and generates the generated The second processing log is sent to the log component 144 in the log system 140; when the data protection service system 12 invokes the virtualization system 13 to execute the scheduled backup request; the data protection service system 12 generates a corresponding second processing log, and generates the second The processing log is sent to the log component 144 in the log system 140; the virtualization system 13 calls the cloud backup management system 14 in turn. When the volume snapshot, volume snapshot comparison, data extraction, data storage, and backup are completed, the virtualization system 13 generates a corresponding second processing log, and sends the generated second processing log to the log component in the log system 140.
  • the cloud backup management system 14 generates a corresponding first processing log according to each step in the five steps of independently performing volume snapshot, volume snapshot comparison, extracting data, storing data, and backing up, and generating the first five processings.
  • the log is sent to the log component 144 in the log system 140; when the cloud backup management system 14 calls the production storage system 15 to store the result of the volume snapshot comparison and the difference data obtained after the extraction, the cloud backup management system 14 generates a corresponding second processing log. And generating the second processing log to the log component 144 in the log system 140; when the cloud backup management system 14 calls the backup storage system 16 to store the current time data, the cloud backup management system 14 generates a corresponding second processing log.
  • the business process model corresponding to the host backup request ID is pre-stored in the modeling component 142; when the host backup service fails, the analysis component 141 in the log system 140 executes according to the business process model and the processing log stored in the log component 144.
  • an abnormal business step is determined, and the business system for performing the abnormal business step is positioned as a faulty business system. For example, if the service step is determined to be an abnormal service step according to the execution result in the second processing log reported by the virtualization system 13, the analysis component 141 determines that the cloud backup management system 14 is a faulty service system.
  • FIG. 5 is a schematic structural diagram of a log system 140 according to an embodiment of the present invention.
  • the log system 140 may include a processor 511 , a communication bus 512 , a memory 513 , and a communication interface 514 .
  • the processor 511 may include one or more central processing units (English: Central Processing Unit, abbreviated: CPU).
  • CPU Central Processing Unit
  • the processor 511 executes various functional applications and business data processing by running software programs and modules.
  • Communication interface 514 may include a wireless network interface, such as an Ethernet interface, or a wired network interface.
  • the communication interface 514 is configured to receive a processing log sent by the service system and a service request ID sent by the identity distribution system.
  • Memory 513 and communication interface 514 are coupled to processor 511 via communication bus 512, respectively.
  • the memory 513 can be used to store software programs and modules that are executed by the processor 511. In addition, various types of service data and user data can also be stored in the memory 513.
  • the memory 513 can store the operating system 51 and program instructions 52 required for at least one function.
  • the program instructions 52 may include a receiving module 521, a determining module 522 and a positioning module 523, an obtaining module 524, and the like.
  • the receiving module 521 is configured to receive a processing log corresponding to the service request identifier ID.
  • the determining module 522 is configured to determine an abnormal service step according to the execution result in the processing log.
  • the positioning module 523 is configured to locate a service system for performing an abnormal service step as a faulty service system.
  • the obtaining module 524 is configured to obtain a business process model corresponding to the service request ID.
  • the memory 513 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), dynamic random access memory (DRAM). ), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (programmable read) -only memory, PROM), read-only memory (ROM), magnetic memory, flash memory, disk or optical disk.
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable programmable read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • magnetic memory magnetic memory
  • flash memory disk or optical disk.
  • the log system 140 in the present invention may include more or fewer components or Combine some components, or different component arrangements.
  • FIG. 6 is a flowchart of a method for providing a fault location method according to an embodiment of the present invention. This embodiment is exemplified by applying the fault location method to the log system 140 shown in FIG. 2.
  • the fault location method includes the following steps:
  • Step 601 The log system receives a processing log corresponding to the service request ID.
  • each service step includes: a service step performed by the first service system, and a service step performed by the first service system to invoke the second service system.
  • the service system that performs the service is taken as an example, but the service system for executing the service is not specifically limited, for example, the service system for executing the service.
  • the method further includes: a third service system; wherein the service is a service executed by the first service system and the second service system in which the call relationship exists, and the first service system 161 and the third service system in which the call relationship exists.
  • the service steps corresponding to the service request ID are only exemplified by the service steps performed by the first service system and the service steps performed by the first service system.
  • each service step may further include: the first service system invokes a service step performed by the third service system.
  • the first service system generates a processing log of each service step corresponding to the service request ID.
  • the first service system executes the service step corresponding to the service request ID, and the first service system invokes the second service system execution and service request When the service step corresponding to the ID is performed, the first service system generates a processing log corresponding to each service step.
  • each service step corresponds to one processing log.
  • the service system A needs to complete the service step 1, the service step 2, and the service step 3 when performing the service step corresponding to the service request B, and the service step 1 needs to invoke the service system C to execute; when the service step 1 is executed, the service system A generates the processing log 1 corresponding to the service step 1; when the service step 2 is executed, the service system A generates the processing log 2 corresponding to the service step 2; when the service step 3 is executed, the service system A generates the corresponding corresponding to the service step 3.
  • the first service system sends the generated processing log to the log system.
  • the first service system sends the processing log to the log system by means of asynchronous sending.
  • the first service system sends the generated processing log to the log system.
  • the log system for example, the first service system sends the processing log 1 to the log system when the processing log 1 is generated; the processing log 2 is sent to the log system when the processing log 2 is generated; and the processing log 3 is sent when the processing log 3 is generated.
  • the processing log 1, the processing log 2, and the processing log 3 are sent to the log system together.
  • the log system receives a processing log corresponding to the service request ID sent by the first service system.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the receiving module 521.
  • Step 602 The log system determines an abnormal service step according to the execution result in the processing log.
  • the log system After receiving the processing log, the log system determines the abnormal service step according to the execution result in the processing log.
  • the execution result in the processing log includes: the execution succeeds or the execution fails; the abnormal business step is the business step corresponding to the execution failure being the execution failure.
  • the log system determines that the service step corresponding to the processing log is an abnormal service step.
  • the log system determines that the service step 2 is an abnormal service step according to the execution result in the processing log 2.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the determination module 522.
  • Step 603 Position the service system for performing the abnormal service step as a faulty service system.
  • the service system for performing the abnormal service step is located as the faulty service system. For example, if the log system determines that the service step 2 is an abnormal service step, the log system determines the service system A that performs the service step 2 as the faulty service system.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the positioning module 523.
  • the fault location method when the first service system and the first service system invoke the second service system to perform the service step corresponding to the service request ID, the first service system sends a corresponding response to the log system.
  • the processing log the log system determines the abnormal service step according to the execution result in the received processing log, and finally locates the faulty service system; since the first service system generates a processing log for each service step, the log system processes the log according to the processing log.
  • the execution result in the system can determine a specific faulty service system, and solves the problem in the prior art that the faulty service system is determined by sequentially checking each service system from above and below, and when the number of service systems is large, the fault is caused.
  • the problem of low positioning efficiency of the service system achieves the effect of locating the faulty service system through the processing log corresponding to the service request ID, thereby improving the positioning efficiency of the faulty service system.
  • the first service system may independently perform an internal service step corresponding to the service request ID; the processing log is a first processing log, and the first processing log is used to record the first service system. Execute the execution result of the internal business step corresponding to the service request ID.
  • step 602 Step 603 can be replaced by the following steps 701 to 705, as shown in FIG. 7:
  • Step 701 The log system acquires a business process model corresponding to the service request ID, where the business process model includes: an execution sequence of each service step corresponding to the service request ID.
  • the log system acquires a business process model corresponding to the service request ID.
  • the internal service step corresponding to the entire service request ID needs to be executed according to a predetermined execution order.
  • a predetermined execution order For example, when the service system A executes the service request 71, a total of 4 needs to be executed.
  • the service steps are respectively the service step 1, the service step 2, the service step 3, and the service step 4.
  • the service step 1 and the service step 2 are performed through the B module; and the service step 3 and the service step 4 are performed through the C module.
  • the business process model corresponding to the business system A executing the business request 71 is as shown in the following Table 1:
  • Business request ID business system Business system module Execution order Business request 71 Business System A B module 1 Business request 71 Business System A C module 2
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the acquisition module 524.
  • Step 702 The log system sequentially acquires n first processing logs corresponding to the respective service steps according to the execution order, where n is a positive integer.
  • the log system After obtaining the business process model, the log system obtains n first processing logs corresponding to the respective service steps from the received processing logs according to the execution order in the business process model.
  • the first processing log is an execution result corresponding to the first service system independently executing an internal service step corresponding to the service request ID.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the acquisition module 524.
  • Step 703 The log system determines, according to the execution result in the first processing log, whether the internal service step is an abnormal service step.
  • the execution result in the first processing log includes: the execution succeeds or the execution fails; the abnormal business step is a business step corresponding to the execution result being the execution failure.
  • the log system determines that the internal service step corresponding to the first processing log is an abnormal service step.
  • the execution result in the first processing log 2, the first processing log 2, and the first processing log 3 in the first processing log 2 is an execution failure
  • the log system determines the internal service step according to the execution result in the first processing log 2. 2 is an abnormal business step.
  • this step can be implemented by the following possible implementation manners:
  • the first step is to determine whether the internal service step is an abnormal service step according to the execution result in the i-th first processing log, where i is a positive integer less than or equal to n.
  • the initial value of i is 1.
  • the log system starts from the first first processing log, and determines whether the corresponding internal service step is an abnormal service step according to the execution result of the first first processing log.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the determination module 522.
  • Step 704 When the internal service step is an abnormal service step, the log system locates the first service system as a faulty service system.
  • the log system determines that the corresponding internal service step in the i-th first processing log is an abnormal service step
  • the log system locates the first service system that performs the internal service step as the faulty service system. For example, the log system determines that the corresponding internal service step 2 in the second first processing log is an abnormal service step, and the log system determines that the first service system A in the internal service step 2 is determined as the faulty service system.
  • the first service system includes: a first processing module having a first API, where the first API has a corresponding first API identifier; and the first processing log includes: a service request ID, a first service system ID, and a first API.
  • the identification and result code, the result code refers to the execution result of the internal processing step performed by the first processing module.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the positioning module 523.
  • Step 705 When the faulty service system is the first service system, the log system locates the API corresponding to the first API identifier as a fault API according to the first API identifier included in the first processing log.
  • the log system determines, according to the result code carried in the first processing log, that the first service system is a faulty service system, and determines an API corresponding to the first API identifier included in the first processing log as a fault API.
  • the first service system B includes a first processing module a and a first processing module b; the first API identifier of the first processing module a is API11, and the first API identifier of the first processing module b is API12; the first service system
  • B performs the service request 72, a total of two internal service steps are required, which are the internal service step 1 and the internal service step 2; the internal processing step 1 is performed first through the first processing module a; and the internal processing is performed through the first processing module b.
  • the service step 2 when the log system determines that the first service system B is the faulty service system according to the result code in the first processing log, the API corresponding to the API 12 is determined to be the fault API according to the API 12 carried in the first processing log.
  • the first processing module b corresponding to the fault API is a fault processing module.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the positioning module 523.
  • the fault location method when the first service system and the second service system perform the service step corresponding to the service request ID, the first service system sends a corresponding processing log to the log system, and the log is generated.
  • the system determines the abnormal service step according to the execution result in the received processing log, and finally locates the faulty service system.
  • the first service system generates a processing log for each service step, so that the log system can perform the execution result according to the processing log.
  • the specific faulty service system is determined, and the prior art needs to check each service system in order from the top and bottom, and finally determine the faulty service system.
  • the positioning efficiency of the faulty service system is caused.
  • the lower problem is that the faulty service system is located through the processing log corresponding to the service request ID, and the effect on the positioning efficiency of the faulty service system is improved.
  • the log system determines whether the internal service step is an abnormal service step according to the execution result in the first processing log according to the execution order in the business process model, which is beneficial to sequentially determining abnormal business steps according to the sequence of executing the business steps, which is beneficial to avoid Waste of resources and improve the efficiency of positioning the faulty business system.
  • the log system determines, according to the first API identifier carried in the first processing log, that the API corresponding to the first API identifier is a fault API, and carries the first in the first processing log.
  • An API identifier so that the log system can locate the fault API according to the API identifier, thereby improving the accuracy of positioning the faulty service system.
  • the first service system invokes the second service system to perform and service.
  • the external service step corresponding to the ID is obtained;
  • the processing log is the second processing log, and the second processing log is used to record the execution result of the external service step corresponding to the service request ID by the first service system in the second service system.
  • the steps 602 to 603 can be replaced by the following steps 801 to 805, as shown in FIG. 8 :
  • Step 801 The log system acquires a business process model corresponding to the service request ID, where the business process model includes: an execution sequence of each service step corresponding to the service request ID.
  • the log system acquires a business process model corresponding to the service request ID.
  • the external service step corresponding to the entire service request ID needs to be executed according to the predetermined execution order, for example, as shown in FIG.
  • a total of six service systems are required to be completed as a service system 91, a service system 92, a service system 93, a service system 94, a service system 95, and a service system 96; a total of seven service steps are required, which are respectively service steps 1 Service Step 2, Service Step 3, Service Step 4, Service Step 5, Service Step 6 and Service Step 7; the service system 91 first executes the service step 1, the service step 2, and the service step 3 through the x module; Business step 4, business step 5, and business step 6; finally, business step 7 is performed through the z module.
  • the service system 91 needs to invoke the service system 92 through the 2-1 API when the service step 1 is executed by the x module; the service step 92 needs to be invoked by the 2-2 API to invoke the service system 93 when the service step 1 is executed by the w module; the service system 91 is required to complete the service step 2 by calling the service system 93 through the 3-1 API; when the service system 91 executes the service step 4 through the y module, the system completes the service system 94 through the 4-1 API; the service system 91 executes the service through the y module.
  • step 5 the system is completed by calling the service system 95 through the 5-1 API; when the service system 91 executes the service step 6 through the y module, the system is completed by calling the service system 96 through the 6-1 API.
  • the business process model corresponding to the execution service request 81 is as shown in Table 1 below:
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the acquisition module 524.
  • Step 802 The log system sequentially acquires m second processing logs corresponding to the respective service steps according to the execution order, where m is a positive integer.
  • the log system After the log system obtains the business process model, it receives from the received order according to the execution order in the business process model. In the management log, obtain m second processing logs corresponding to each service step.
  • the second processing log is an execution result corresponding to when the called second service system executes an external service step corresponding to the service request ID.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the acquisition module 524.
  • Step 803 The log system determines, according to the execution result in the second processing log, whether the external service step is an abnormal service step.
  • the execution result in the second processing log includes: the execution succeeds or the execution fails; the abnormal business step is the business step corresponding to the execution result being the execution failure.
  • the log system determines that the external service step corresponding to the second processing log is an abnormal service step.
  • the execution result in the second processing log 2, the second processing log 2, and the second processing log 3 in the second processing log 2 is an execution failure
  • the log system determines the external service step according to the execution result in the second processing log 2. 2 is an abnormal business step.
  • this step can be implemented by the following possible implementation manners:
  • the external service step is an abnormal service step according to the execution result in the jth second processing log, and j is a positive integer equal to or smaller than m.
  • the initial value of j is 1.
  • the log system starts from the first second processing log, and determines whether the corresponding external service step is an abnormal service step according to the execution result of the first second processing log.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the determination module 522.
  • Step 804 When the external service step is an abnormal service step, the log system locates the called second service system as a faulty service system.
  • the log system determines that the corresponding external service step in the jth second processing log is an abnormal service step
  • the log system locates the second service system that performs the external service step as the faulty service system. For example, the log system determines that the corresponding external service step 2 in the second second processing log is an abnormal service step, and the log system is determined to be the faulty service system by the second service system A1 that is called to execute the external service step 2.
  • the first service system includes: a first processing module having a first API, the first API has a corresponding first API identifier, and the second service system includes: a second processing module having a second API, and a second API Having a corresponding second API identifier;
  • the second processing log includes: a service request ID, a first service system ID, a first API identifier, a second service system ID, a second API identifier, and a return code, and the result code is in the call
  • the second processing module executes the execution result of the external business step.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the positioning module 523.
  • Step 805 When the faulty service system is the second service system, the log system locates the API corresponding to the second API identifier as the fault API according to the second API identifier included in the second processing log.
  • the log system determines, according to the return code carried in the second processing log, that the called second service system is the faulty service system, and determines the API corresponding to the second API identifier included in the second processing log as the fault API.
  • the second service system B1 includes the second processing module a1 and the second processing module b1; the second API identifier of the second processing module a1 is the API 21, and the second API identifier of the second processing module b is the API 22; System When B1 executes the service request 82, a total of two external service steps are required, which are external service step 1 and external service step 2; first, external service step 1 is performed through the second processing module a1; and external processing is performed through the second processing module b1.
  • the service step 2 when the log system determines that the second service system B1 is the faulty service system according to the return code in the second process log, the API corresponding to the API 22 is determined to be the fault API according to the API 22 carried in the second process log.
  • the second processing module b1 corresponding to the fault API is a fault processing module.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the positioning module 523.
  • the fault location method when the first service system and the second service system perform the service step corresponding to the service request ID, the first service system sends a corresponding processing log to the log system, and the log is generated.
  • the system determines the abnormal service step according to the execution result in the received processing log, and finally locates the faulty service system.
  • the first service system generates a processing log for each service step, so that the log system can perform the execution result according to the processing log.
  • the specific faulty service system is determined, and the prior art needs to check each service system in order from the top and bottom, and finally determine the faulty service system.
  • the positioning efficiency of the faulty service system is caused.
  • the lower problem is that the faulty service system is located through the processing log corresponding to the service request ID, and the effect on the positioning efficiency of the faulty service system is improved.
  • the log system determines whether the external service step is an abnormal service step according to the execution result in the second processing log according to the execution order in the business process model, which is beneficial to sequentially determining abnormal business steps according to the sequence of executing the business steps, thereby facilitating avoidance. Waste of resources and improve the efficiency of positioning the faulty business system.
  • the log system determines, according to the second API identifier carried in the second processing log, that the API corresponding to the second API identifier is a fault API, and carries the first in the second processing log.
  • the second API identifier is provided, so that the log system can locate the fault API according to the API identifier, thereby improving the accuracy of positioning the faulty service system.
  • FIG. 10 is a flowchart of a method for locating a fault according to another embodiment of the present invention.
  • This embodiment is exemplified by applying the fault location method to the fault location system shown in FIG. 3.
  • the log system includes: an analysis component, a modeling component, an ID processing component, and a log component;
  • the fault location method includes the following steps:
  • step 1001 the analysis component receives the service request ID to be analyzed.
  • the analysis component receives the input service request ID to be analyzed.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the receiving module 521.
  • step 1002 the analysis component obtains the business process model ID corresponding to the service request ID through the ID processing component.
  • the analysis component sends a process ID request carrying the service request ID to the ID processing component.
  • the process ID request is used to request a business process model ID corresponding to the service request ID from the ID processing component.
  • the ID processing component After receiving the process ID request, the ID processing component queries the business process model ID corresponding to the service request ID according to the service request ID carried in the process ID request, and feeds the queryed business process model ID to the analysis component.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the acquisition module 524.
  • Step 1003 The analysis component acquires a business process model corresponding to the service request ID by using the modeling component.
  • the analysis component sends a model request carrying the business process model ID to the modeling component.
  • the model request is used to request feedback from the modeling component to the business process model corresponding to the business process model ID.
  • the modeling component After receiving the model request, the modeling component queries the business process model corresponding to the business process model ID according to the business process model ID carried in the model request, and feeds back the queryed business process model to the analysis component.
  • step 701 shown in FIG. 7 For a detailed description of the business process model, please refer to step 701 shown in FIG. 7 and step 801 shown in the figure, and details are not described herein again.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the acquisition module 524.
  • step 1004 the analysis component obtains a processing log corresponding to the service request ID through the log component.
  • the analysis component After obtaining the business process model, the analysis component sends a log acquisition request to the log component, where the log acquisition request is used to request the log component to feed back the processing log corresponding to the service request ID; optionally, the processing log includes: the first processing log and The second processing log.
  • step 802 shown in FIG. 8 For a detailed description of the first processing log, refer to step 802 shown in FIG. 8, and details are not described herein again.
  • the log component After receiving the log obtaining request, the log component obtains the service request ID carried in the log obtaining request, and queries the first processing log and the second processing log corresponding to the service request ID according to the service request ID; and the queried service request ID The corresponding n first processing logs and m second processing logs are fed back to the analysis component.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the acquisition module 524.
  • Step 1005 The analysis component determines, according to the execution result in the i-th first processing log, whether the internal service step is an abnormal service step.
  • the analysis component determines, according to the execution result in the i-th first processing log, whether the internal service step is an abnormal service step according to an execution order in the business process model, where i is a positive integer less than or equal to n.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the determination module 522.
  • step 1005 Through the loop of step 1005 and step 1006, until the abnormal service step is determined, otherwise the execution results in the n first processing logs are sequentially determined.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the determination module 522.
  • step 1007 if it is an abnormal service step, the analysis component obtains t second processing logs corresponding to the first processing module that executes the abnormal service step from the m second processing logs.
  • the analysis component determines that the first processing module that executes the internal service step is a faulty module. Because the first processing module has the possibility of invoking other service systems to complete the business step, the analysis component acquires The t second processing logs corresponding to the first processing module.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the acquisition module 524.
  • Step 1008 The analysis component determines, according to the execution result in the jth second processing log, whether the external service step is an abnormal service step.
  • the analyzing component determines, according to the execution result in the jth second processing log, whether the external service step is an abnormal service step according to the execution order in the business process model, where j is a positive integer less than or equal to t.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the determination module 522.
  • step 1008 Through the loop of step 1008 and step 1009, until the abnormal business step is determined, otherwise the execution results in the t second processing logs are sequentially determined.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the determination module 522.
  • Step 1010 If it is an abnormal service step, the second API label in the second service system to which the analysis component is to be called is The corresponding API is located as a fault API.
  • the analysis component locates the API corresponding to the second API identifier corresponding to the second processing module of the external service step as the fault API.
  • step 805 For details of this step, refer to step 805 shown in Figure 8, and details are not described herein.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the positioning module 524.
  • Step 1011 If there is no second processing log corresponding to the first processing module, the analysis component locates the API corresponding to the first API identifier of the first processing module as a fault API.
  • the analysis component determines that the first processing module that executes the internal service step is a fault module, and if the m second processing logs are not When there is a second processing log corresponding to the first processing module, the analysis component locates the API corresponding to the first API identifier in the first processing module as a fault API; or, the analysis component determines the t corresponding to the first processing module.
  • the analysis component locates the API corresponding to the first API identifier in the first processing module as the fault API.
  • This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the positioning module 524.
  • the fault location method when the first service system and the second service system perform the service step corresponding to the service request ID, the first service system sends a corresponding processing log to the log system, and the log is generated.
  • the system determines the abnormal service step according to the execution result in the received processing log, and finally locates the faulty service system.
  • the first service system generates a processing log for each service step, so that the log system can perform the execution result according to the processing log.
  • the specific faulty service system is determined, and the prior art needs to check each service system in order from the top and bottom, and finally determine the faulty service system.
  • the positioning efficiency of the faulty service system is caused.
  • the lower problem is that the faulty service system is located through the processing log corresponding to the service request ID, and the effect on the positioning efficiency of the faulty service system is improved.
  • FIG. 11 is a structural block diagram of a fault locating device according to an embodiment of the present invention.
  • the fault locating device can be implemented as a log system 140 shown in FIG. 2 or FIG. 3 by software, hardware, or a combination of both. All or part of it.
  • the fault location device can include:
  • the receiving unit 1120 has the same or similar functions as the receiving module 521, and other implicit functions included by the receiving module 521.
  • the determining unit 1140 has the same or similar functionality as the determining module 522, as well as other implicit functions included by the determining module 522.
  • the positioning unit 1160 has the same or similar functionality as the positioning module 523, as well as other implicit functions contained by the positioning module 523.
  • the obtaining unit 1180 has the same or similar function as the obtaining module 524, and is included by the obtaining module 524 Other hidden features.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Abstract

Disclosed in the present invention are a fault localization platform, a fault localization method and a device, related to the technical field of communications. The method comprises: when executing a service step corresponding to a service request ID, sending a corresponding processing log from a first service system to a log system, the log system determining an abnormal service step according to an execution result of the received processing log, and thereby localizing a faulty service system; the first service system generates a processing log for each service step, making it possible for the log system to determine a specific faulty service system by means of an execution result of a processing log, solving the problem in the prior art of needing to check each service system from top to bottom in sequence so as to determine a faulty service system; when there are relatively many service systems, leading to a problem of lower faulty service system localization efficiency, faulty service system localization can by achieved by means of processing logs corresponding to a service request ID, improving faulty service system localization efficiency.

Description

故障定位平台、故障定位方法及装置Fault location platform, fault location method and device 技术领域Technical field
本发明涉及通信技术领域,特别涉及一种故障定位平台、故障定位方法及装置。The present invention relates to the field of communications technologies, and in particular, to a fault location platform, a fault location method, and a device.
背景技术Background technique
在云服务环境中,一个平台为了提供多种业务,通常在平台中设置有多个系统,通过多个系统之间的交互调用完成多种业务。其中,多种业务可以包括:文件业务、对象业务和主机备份业务等。In a cloud service environment, in order to provide multiple services, a platform usually has multiple systems in the platform, and multiple services are completed through interaction between multiple systems. Among them, a variety of services may include: file services, object services, and host backup services.
现有技术中,平台在执行某一业务时,平台中的多个系统之间存在交互调用的情况,当该业务执行失败时,需要按照从上至下的顺序,从平台中最上层的系统开始,依次排查执行该业务时存在交互调用的各个系统,最终定位出现故障的系统。请参考图1所示,以云平台100中包括:云管理系统210、数据保护服务系统220、虚拟化系统230、生产存储系统240、云备份管理系统250和备份存储系统260为例。该云平台100执行主机备份业务流程如下:云管理系统210向数据保护服务系统220发送备份请求;数据保护服务系统220在接收到备份请求后,向虚拟化系统230发送调度备份请求;虚拟化系统230根据接收到的调度备份请求,向云备份管理系统250发送执行备份请求,并每隔预设时间查询备份状态;云备份管理系统250根据执行备份请求依次执行卷快照251、卷快照对比252、提取数据253、存放数据254和备份完成255。其中,卷快照对比是指将当前时刻的数据与上一时刻的数据进行对比;云备份管理系统250将卷快照对比的结果和经过提取后得到的差异数据存储至生产存储系统240中;将当前时刻的数据存放至备份存储系统260中。In the prior art, when a certain service is executed by a platform, there are interaction calls between multiple systems in the platform. When the service fails to execute, the system needs to be in the highest order from the top to the bottom. At the beginning, each system that has interactive calls is executed in turn to check the system, and finally the faulty system is located. Referring to FIG. 1 , the cloud platform 100 includes: a cloud management system 210 , a data protection service system 220 , a virtualization system 230 , a production storage system 240 , a cloud backup management system 250 , and a backup storage system 260 . The cloud platform 100 performs a host backup service process as follows: the cloud management system 210 sends a backup request to the data protection service system 220; after receiving the backup request, the data protection service system 220 sends a scheduling backup request to the virtualization system 230; the virtualization system 230, according to the received scheduled backup request, send a backup request to the cloud backup management system 250, and query the backup status every preset time; the cloud backup management system 250 sequentially performs a volume snapshot 251, a volume snapshot comparison 252 according to the execution backup request, Extract data 253, store data 254, and backup complete 255. The volume snapshot comparison refers to comparing the current time data with the previous time data; the cloud backup management system 250 stores the result of the volume snapshot comparison and the extracted difference data into the production storage system 240; The time data is stored in the backup storage system 260.
在实现本发明实施例的过程中,发明人发现现有技术至少存在以下问题:In the process of implementing the embodiments of the present invention, the inventors have found that the prior art has at least the following problems:
当主机备份执行失败时,需要从最上层的云管理系统210开始,依次排查数据保护服务系统220、虚拟化系统230、生产存储系统240、云备份管理系统250和备份存储系统260是否出现故障,最终定位出现故障的系统,导致对出现故障的系统的定位效率较低。When the host backup fails, it is necessary to start from the uppermost cloud management system 210, and sequentially check whether the data protection service system 220, the virtualization system 230, the production storage system 240, the cloud backup management system 250, and the backup storage system 260 are faulty. Eventually locating a failed system results in less efficient positioning of the failed system.
发明内容Summary of the invention
为了解决现有技术中的问题,本发明实施例提供了一种故障定位平台、故障定位方法及装置。所述技术方案如下:In order to solve the problems in the prior art, the embodiments of the present invention provide a fault location platform, a fault location method, and a device. The technical solution is as follows:
第一方面,提供了一种故障定位平台,所述平台包括:标识分配系统、日志系统、第一业务系统和第二业务系统;In a first aspect, a fault location platform is provided, where the platform includes: an identity distribution system, a log system, a first service system, and a second service system;
所述标识分配系统,用于向业务请求分配业务请求标识(identification,ID),所述业务请求是所述第一业务系统执行业务时发送的;所述业务是由存在调用关系的所述第一业务系统和所述第二业务系统协作执行的业务;所述第一业务系统,用于生成与所述业务请求ID对应的各个业务步骤的处理日志,所述处理日志用于记录所述业务步骤的执行结果;所述各个业务步骤包括:所述第一业务系统执行的业务步骤,和,所述第一业务系统调用所述第二业务系统执行的业务步骤;所述日志系统,用于接收与所述业务请求ID对应的所述处理日志;根据所述处理日志中的所述执行结果确定异常业务步骤,将用于执行所述异常业务步骤的业务系统定位为故障业务系统。 The identifier distribution system is configured to allocate a service request identifier (ID) to the service request, where the service request is sent when the first service system executes a service; a service that is executed by the service system and the second service system; the first service system is configured to generate a process log of each service step corresponding to the service request ID, where the process log is used to record the service The execution result of the step; the respective service steps include: a service step performed by the first service system, and a service step performed by the first service system by the first service system; the log system is used for Receiving the processing log corresponding to the service request ID; determining an abnormal service step according to the execution result in the processing log, and positioning the service system for performing the abnormal service step as a faulty service system.
本发明实施例所示的方案,由于第一业务系统和第一业务系统调用第二业务系统在执行与业务请求ID对应的业务步骤时,第一业务系统向日志系统发送对应的处理日志,日志系统根据接收到的处理日志中的执行结果确定异常业务步骤,最终定位出故障业务系统;由于第一业务系统为每个业务步骤都生成一个处理日志,使得日志系统根据处理日志中的执行结果可以确定出具体的故障业务系统,解决了现有技术中需要通过从上之下依次排查各个业务系统,最终确定出故障业务系统,当业务系统的个数较多时,导致对故障业务系统的定位效率较低的问题,达到了通过与业务请求ID对应的处理日志定位故障业务系统,提高了对故障业务系统的定位效率的效果。In the solution shown in the embodiment of the present invention, when the first service system and the first service system call the second service system to perform the service step corresponding to the service request ID, the first service system sends a corresponding processing log to the log system, and the log is generated. The system determines the abnormal service step according to the execution result in the received processing log, and finally locates the faulty service system. The first service system generates a processing log for each service step, so that the log system can perform the execution result according to the processing log. The specific faulty service system is determined, and the prior art needs to check each service system in order from the top and bottom, and finally determine the faulty service system. When the number of service systems is large, the positioning efficiency of the faulty service system is caused. The lower problem is that the faulty service system is located through the processing log corresponding to the service request ID, and the effect on the positioning efficiency of the faulty service system is improved.
在第一方面的第一种可能的实现方式中,所述第一业务系统,用于在执行与所述业务请求ID对应的内部业务步骤时,生成与所述内部业务步骤对应的第一处理日志,向所述日志系统发送所述第一处理日志,所述第一处理日志用于记录所述第一业务系统执行所述内部业务步骤的执行结果;所述第一业务系统,还用于在调用所述第二业务系统执行与所述业务请求ID对应的外部业务步骤时,生成与所述外部业务步骤对应的第二处理日志,向所述日志系统发送所述第二处理日志,所述第二处理日志用于记录被调用的所述第二业务系统执行所述外部业务步骤的执行结果;所述日志系统,用于根据所述第一处理日志中的所述执行结果确定所述内部业务步骤是否为所述异常业务步骤,在所述内部业务步骤是所述异常业务步骤时,将所述第一业务系统定位为所述故障业务系统;根据所述第二处理日志中的所述执行结果确定所述外部业务步骤是否为所述异常业务步骤,在所述外部业务步骤是所述异常业务步骤时,将被调用的所述第二业务系统定位为所述故障业务系统。In a first possible implementation manner of the first aspect, the first service system is configured to generate a first process corresponding to the internal service step when performing an internal service step corresponding to the service request ID a log, the first processing log is sent to the log system, where the first processing log is used to record an execution result of the step of executing the internal service by the first service system; the first service system is further used to: And generating, by the second service system, an external service step corresponding to the service request ID, generating a second processing log corresponding to the external service step, and sending the second processing log to the log system, where The second processing log is configured to record an execution result of the executed second service system to execute the external service step; the log system is configured to determine, according to the execution result in the first processing log, Whether the internal service step is the abnormal service step, and when the internal service step is the abnormal service step, the first service system is located as the fault a service system, determining, according to the execution result in the second processing log, whether the external service step is the abnormal service step, where the external service step is the abnormal service step, the called The second service system is located as the faulty service system.
本发明实施例所示的方案,第一业务系统将执行内部业务步骤的执行结果记录为第一处理日志;将执行外部业务步骤的执行结果记录为第二处理日志;日志系统根据第一处理日志的执行结果可以确定出第一业务系统是否为故障业务系统;根据第二处理日志的执行结果可以确定出第二业务系统是否为故障业务系统;将内部业务步骤和外部业务步骤进行区别记录,有利于提高对故障业务系统的定位效率。In the solution shown in the embodiment of the present invention, the first service system records the execution result of the execution of the internal service step as the first processing log; the execution result of the execution of the external service step is recorded as the second processing log; and the log system according to the first processing log The execution result may determine whether the first service system is a faulty service system; according to the execution result of the second process log, it may be determined whether the second service system is a faulty service system; and the internal service step and the external service step are separately recorded, It is beneficial to improve the positioning efficiency of faulty business systems.
结合第一方面的第一种可能的实现方式,在第二种可能的实现方式中,所述第一业务系统包括:具有第一应用编程接口(Application Programming Interface,API)的第一处理模块,所述第一API具有对应的第一API标识;所述第二业务系统包括:具有第二API的第二处理模块,所述第二API具有对应的第二API标识;所述第一业务系统,用于向所述日志系统发送所述第一处理日志;所述第一处理日志包括:所述业务请求ID、第一业务系统ID、所述第一API标识和结果码,所述结果码是指所述第一处理模块执行所述内部业务步骤的执行结果;所述第一业务系统,还用于向所述日志系统发送所述第二处理日志;所述第二处理日志包括:所述业务请求ID、所述第一业务系统ID、所述第一API标识、第二业务系统ID、所述第二API标识和返回码,所述返回码是指在调用所述第二处理模块执行所述外部业务步骤的执行结果;所述日志系统,用于在所述故障业务系统为所述第一业务系统时,将所述第一API标识对应的API定位为故障API;在所述故障业务系统为被调用的所述第二业务系统时,将所述第二API标识对应的API定位为所述故障API。With reference to the first possible implementation of the first aspect, in a second possible implementation, the first service system includes: a first processing module having a first application programming interface (API), The first API has a corresponding first API identifier; the second service system includes: a second processing module having a second API, the second API has a corresponding second API identifier; and the first service system And sending the first processing log to the log system; the first processing log includes: the service request ID, a first service system ID, the first API identifier, and a result code, and the result code The first processing module performs the execution result of the internal service step; the first service system is further configured to send the second processing log to the log system; the second processing log includes: Describe the service request ID, the first service system ID, the first API identifier, the second service system ID, the second API identifier, and a return code, where the return code refers to calling the second processing module The execution result of the external service step is performed; the log system is configured to: when the faulty service system is the first service system, locate an API corresponding to the first API identifier as a fault API; When the faulty service system is the called second service system, the API corresponding to the second API identifier is located as the fault API.
本发明实施例所示的方案,在故障业务系统为第一业务系统时,日志系统根据第一处理日志中携带的第一API标识,确定第一API标识对应的API为故障API;在故障业 务系统为第二业务系统时,日志系统根据第二处理日志中携带的第二API标识,确定第二API表示对应的API为故障API;通过在第一处理日志中携带有第一API标识和第二处理日志中携带有第二API标识,以便日志系统可以根据API标识定位出故障API,提高了对故障业务系统的定位的准确性的效果。In the solution shown in the embodiment of the present invention, when the faulty service system is the first service system, the log system determines, according to the first API identifier carried in the first processing log, that the API corresponding to the first API identifier is a fault API; When the service system is the second service system, the log system determines, according to the second API identifier carried in the second processing log, that the API corresponding to the second API is a fault API; and the first processing identifier carries the first API identifier and The second processing log carries the second API identifier, so that the log system can locate the fault API according to the API identifier, thereby improving the accuracy of the positioning of the faulty service system.
结合第一方面的第二种可能的实现方式,在第三种可能的实现方式中,所述日志系统,用于获取与所述业务请求ID对应的业务流程模型,所述业务流程模型包括:与所述业务请求ID对应的各个业务步骤的执行顺序;根据所述执行顺序依次获取与各个业务步骤对应的n个第一处理日志和m个第二处理日志,所述n和所述m分别为正整数。With reference to the second possible implementation of the first aspect, in a third possible implementation, the log system is configured to obtain a business process model corresponding to the service request ID, where the business process model includes: An execution sequence of each service step corresponding to the service request ID; and sequentially acquiring n first processing logs and m second processing logs corresponding to the respective service steps according to the execution order, where the n and the m are respectively Is a positive integer.
本发明实施例所示的方案,日志系统根据业务流程模型中的执行顺序获取与各个业务步骤对应的第一处理日志和第二处理日志,有利于按照执行业务步骤的先后顺序依次确定异常业务步骤,有利于避免资源浪费,提高对故障业务系统的定位效率的效果。In the solution shown in the embodiment of the present invention, the log system obtains the first processing log and the second processing log corresponding to each service step according to the execution order in the business process model, which is beneficial to sequentially determining abnormal business steps according to the sequence of executing the business steps. It is beneficial to avoid waste of resources and improve the efficiency of positioning the faulty business system.
结合第一方面的第三种可能的实现方式,在第四种可能的实现方式中,所述日志系统,还用于:根据第i个第一处理日志中的执行结果确定所述内部业务步骤是否为所述异常业务步骤,所述i为小于等于n的正整数;若是所述异常业务步骤,则将所述第i个第一处理日志中包括的第一API标识对应的API定位为所述故障API;若不是所述异常业务步骤,则令i=i+1,再次根据所述第i个第一处理日志中的执行结果确定所述内部业务步骤是否为所述异常业务步骤。With reference to the third possible implementation manner of the first aspect, in a fourth possible implementation, the log system is further configured to: determine the internal service step according to an execution result in the i th first processing log Whether it is the abnormal service step, the i is a positive integer less than or equal to n; if the abnormal service step, the API corresponding to the first API identifier included in the i-th first processing log is located as If the abnormal service step is not performed, let i=i+1, and determine, according to the execution result in the i-th first processing log, whether the internal service step is the abnormal service step.
本发明实施例所示的方案,日志系统根据业务流程模型中的执行顺序依次根据第一处理日志中的执行结果确定内部业务步骤是否为异常业务步骤,有利于按照执行业务步骤的先后顺序依次确定异常业务步骤,有利于避免资源浪费,提高对故障业务系统的定位效率的效果。According to the solution shown in the embodiment of the present invention, the log system determines whether the internal service step is an abnormal service step according to the execution result in the first processing log according to the execution order in the business process model, which is beneficial to sequentially determining according to the sequence of executing the business steps. Abnormal business steps are beneficial to avoid waste of resources and improve the efficiency of positioning the faulty business system.
结合第一方面的第三种可能的实现方式,在第五种可能的实现方式中,所述日志系统,还用于:根据第j个第二处理日志中的执行结果确定所述外部业务步骤是否为所述异常业务步骤,所述j为小于等于m的正整数;若是所述异常业务步骤,则将所述第j个第二处理日志中包括的第二API标识对应的API定位为所述故障API;若不是所述异常业务步骤,则令j=j+1,再次根据所述第j个第二处理日志中的执行结果确定所述外部业务步骤是否为所述异常业务步骤。In conjunction with the third possible implementation of the first aspect, in a fifth possible implementation, the log system is further configured to: determine the external service step according to an execution result in the jth second processing log Whether it is the abnormal service step, the j is a positive integer less than or equal to m; if the abnormal service step, the API corresponding to the second API identifier included in the jth second processing log is located as If the abnormal service step is not performed, let j=j+1, and determine, according to the execution result in the jth second processing log, whether the external service step is the abnormal service step.
本发明实施例所示的方案,日志系统根据业务流程模型中的执行顺序依次根据第二处理日志中的执行结果确定外部业务步骤是否为异常业务步骤,有利于按照执行业务步骤的先后顺序依次确定异常业务步骤,有利于避免资源浪费,提高对故障业务系统的定位效率的效果。According to the solution shown in the embodiment of the present invention, the log system determines whether the external service step is an abnormal service step according to the execution result in the second processing log according to the execution order in the business process model, which is beneficial to sequentially determining according to the sequence of executing the business steps. Abnormal business steps are beneficial to avoid waste of resources and improve the efficiency of positioning the faulty business system.
第二方面,提供了一种故障定位方法,所述方法包括:接收与业务请求标识ID对应的处理日志;业务请求是第一业务系统执行业务时发送的,所述业务是由存在调用关系的所述第一业务系统和第二业务系统协作执行的业务,所述处理日志用于记录与所述业务请求ID对应的各个业务步骤的执行结果,所述各个业务步骤包括:所述第一业务系统执行的业务步骤,和,所述第一业务系统调用所述第二业务系统执行的业务步骤;根据所述处理日志中的所述执行结果确定异常业务步骤;将用于执行所述异常业务步骤的业务系统定位为故障业务系统。A second aspect provides a fault locating method, the method comprising: receiving a processing log corresponding to a service request identifier ID; the service request is sent when the first service system executes a service, and the service is performed by a presence call relationship And the processing log is used to record the execution result of each service step corresponding to the service request ID, where the respective service steps include: the first service a service step performed by the system, and the first service system invokes a service step performed by the second service system; determining an abnormal service step according to the execution result in the processing log; and is configured to execute the abnormal service The business system of the step is located as a faulty business system.
本发明实施例所示的方案,由于日志系统根据接收到的与业务请求ID对应的处理日志中的执行结果确定异常业务步骤,最终定位出故障业务系统;由于业务系统为每个业 务步骤都生成一个处理日志,使得日志系统根据处理日志中的执行结果可以确定出具体的故障业务系统,解决了现有技术中需要通过从上之下依次排查各个业务系统,最终确定出故障业务系统,当业务系统的个数较多时,导致对故障业务系统的定位效率较低的问题,达到了通过与业务请求ID对应的处理日志定位故障业务系统,提高了对故障业务系统的定位效率的效果。In the solution shown in the embodiment of the present invention, the log system determines the abnormal service step according to the received execution result in the processing log corresponding to the service request ID, and finally locates the faulty service system; The processing step generates a processing log, so that the log system can determine a specific faulty service system according to the execution result in the processing log, and solves the problem in the prior art that the service systems are sequentially checked from the top to the bottom, and the faulty service is finally determined. When the number of service systems is large, the system has a low efficiency in locating the faulty service system, and the faulty service system is located through the processing log corresponding to the service request ID, thereby improving the positioning efficiency of the faulty service system. effect.
在第二方面的第一种可能的实现方式中,所述处理日志包括:第一处理日志和第二处理日志;所述根据所述处理日志中的所述执行结果确定异常业务步骤,包括:根据第一处理日志中的所述执行结果确定内部业务步骤是否为所述异常业务步骤;所述第一处理日志用于记录所述第一业务系统执行与所述业务请求ID对应的所述内部业务步骤的执行结果;根据第二处理日志中的所述执行结果确定外部业务步骤是否为所述异常业务步骤;所述第二处理日志用于记录在调用所述第二业务系统执行与所述业务请求ID对应的所述外部业务步骤的执行结果。In a first possible implementation manner of the second aspect, the processing log includes: a first processing log and a second processing log; and determining, according to the execution result in the processing log, an abnormal service step, including: Determining, according to the execution result in the first processing log, whether the internal service step is the abnormal service step; the first processing log is used to record that the first service system executes the internal corresponding to the service request ID An execution result of the business step; determining, according to the execution result in the second processing log, whether the external service step is the abnormal service step; the second processing log is used to record the execution of the second service system and the The execution result of the external service step corresponding to the service request ID.
结合第二方面的第一种可能的实现方式,在第二方面的第二种可能的实现方式中,所述将用于执行所述异常业务步骤的业务系统定位为故障业务系统,包括:在所述内部业务步骤是所述异常业务步骤时,将所述第一业务系统定位为所述故障业务系统;在所述外部业务步骤是所述异常业务步骤时,将被调用的所述第二业务系统定位为所述故障业务系统。With reference to the first possible implementation of the second aspect, in a second possible implementation manner of the second aspect, the determining, by using the service system for performing the abnormal service step, the faulty service system includes: When the internal service step is the abnormal service step, the first service system is located as the faulty service system; when the external service step is the abnormal service step, the second to be called The business system is located as the faulty business system.
本发明实施例所示的方案,业务系统将执行内部业务步骤的执行结果记录为第一处理日志;将执行外部业务步骤的执行结果记录为第二处理日志;日志系统根据第一处理日志的执行结果可以确定出第一业务系统是否为故障业务系统;根据第二处理日志的执行结果可以确定出第二业务系统是否为故障业务系统;将内部业务步骤和外部业务步骤进行区别记录,有利于提高对故障业务系统的定位效率。In the solution shown in the embodiment of the present invention, the service system records the execution result of the execution of the internal service step as the first processing log; the execution result of the execution of the external service step is recorded as the second processing log; and the execution of the log processing system according to the first processing log As a result, it can be determined whether the first service system is a faulty service system; according to the execution result of the second process log, it can be determined whether the second service system is a faulty service system; and the internal service step and the external service step are recorded separately, which is beneficial to improve Positioning efficiency for faulty business systems.
结合第二方面的第二种可能的实现方式,在第二方面的第三种可能的实现方式中,所述第一业务系统包括:具有第一应用编程接口API的第一处理模块,所述第一API具有对应的第一API标识;所述第二业务系统包括:具有第二API的第二处理模块,所述第二API具有对应的第二API标识;所述方法,还包括:在所述故障业务系统为所述第一业务系统时,根据所述第一处理日志中包含的所述第一API标识,将所述第一API标识对应的API定位为故障API;所述第一处理日志包括:所述业务请求ID、第一业务系统ID、所述第一API标识和结果码,所述结果码是指所述第一处理模块执行所述内部业务步骤的执行结果;在所述故障业务系统为被调用的所述第二业务系统时,根据所述第二处理日志中包含的所述第二API标识,将所述第二API标识对应的API定位为所述故障API;所述第二处理日志包括:所述业务请求ID、所述第一业务系统ID、所述第一API标识、第二业务系统ID、所述第二API标识和返回码,所述返回码是指在调用所述第二处理模块执行所述外部业务步骤的执行结果。With reference to the second possible implementation of the second aspect, in a third possible implementation of the second aspect, the first service system includes: a first processing module having a first application programming interface API, The first API has a corresponding first API identifier; the second service system includes: a second processing module having a second API, the second API has a corresponding second API identifier; the method further includes: When the faulty service system is the first service system, the API corresponding to the first API identifier is located as a fault API according to the first API identifier included in the first processing log; The processing log includes: the service request ID, the first service system ID, the first API identifier, and the result code, where the result code refers to an execution result of the first processing module performing the internal service step; When the faulty service system is the called second service system, the API corresponding to the second API identifier is located as the fault API according to the second API identifier included in the second processing log; The second place The management log includes: the service request ID, the first service system ID, the first API identifier, the second service system ID, the second API identifier, and a return code, where the return code refers to The second processing module executes the execution result of the external service step.
本发明实施例所示的方案,在故障业务系统为第一业务系统时,日志系统根据第一处理日志中携带的第一API标识,确定第一API标识对应的API为故障API;在故障业务系统为第二业务系统时,日志系统根据第二处理日志中携带的第二API标识,确定第二API表示对应的API为故障API;通过在第一处理日志中携带有第一API标识和第二处理日志中携带有第二API标识,以便日志系统可以根据API标识定位出故障API,提高了对故障业务系统的定位的准确性的效果。 In the solution shown in the embodiment of the present invention, when the faulty service system is the first service system, the log system determines, according to the first API identifier carried in the first processing log, that the API corresponding to the first API identifier is a fault API; When the system is the second service system, the log system determines, according to the second API identifier carried in the second processing log, that the API corresponding to the second API is a fault API; and the first API identifier carries the first API identifier and the first The second processing log carries the second API identifier, so that the log system can locate the fault API according to the API identifier, thereby improving the accuracy of positioning the faulty service system.
结合第二方面的第三种可能的实现方式,在第四种可能的实现方式中,所述方法,还包括:获取与所述业务请求ID对应的业务流程模型,所述业务流程模型包括:与所述业务请求ID对应的各个业务步骤的执行顺序;根据所述执行顺序依次获取与各个业务步骤对应的n个第一处理日志和m个第二处理日志,所述n和所述m分别为正整数。With reference to the third possible implementation of the second aspect, in a fourth possible implementation, the method may further include: acquiring a business process model corresponding to the service request ID, where the business process model includes: An execution sequence of each service step corresponding to the service request ID; and sequentially acquiring n first processing logs and m second processing logs corresponding to the respective service steps according to the execution order, where the n and the m are respectively Is a positive integer.
本发明实施例所示的方案,日志系统根据业务流程模型中的执行顺序获取与各个业务步骤对应的第一处理日志和第二处理日志,有利于按照执行业务步骤的先后顺序依次确定异常业务步骤,有利于避免资源浪费,提高对故障业务系统的定位效率的效果。In the solution shown in the embodiment of the present invention, the log system obtains the first processing log and the second processing log corresponding to each service step according to the execution order in the business process model, which is beneficial to sequentially determining abnormal business steps according to the sequence of executing the business steps. It is beneficial to avoid waste of resources and improve the efficiency of positioning the faulty business system.
结合第二方面的第四种可能的实现方式,在第五种可能的实现方式中,所述根据第一处理日志中的所述执行结果确定内部业务步骤是否为所述异常业务步骤,包括:根据第i个第一处理日志中的执行结果确定所述内部业务步骤是否为所述异常业务步骤,所述i为小于等于n的正整数;所述将所述第一API标识对应的API定位为故障API,包括:若是所述异常业务步骤,则将所述第i个第一处理日志中包括的第一API标识对应的API定位为所述故障API;若不是所述异常业务步骤,则令i=i+1,再次执行所述根据所述第i个第一处理日志中的执行结果确定是否为所述异常业务步骤的步骤。With reference to the fourth possible implementation manner of the foregoing aspect, in a fifth possible implementation, the determining, according to the execution result in the first processing log, whether the internal service step is the abnormal service step includes: Determining, according to the execution result in the ith first processing log, whether the internal service step is the abnormal service step, where i is a positive integer less than or equal to n; and the API corresponding to the first API identifier is located The failure API includes: if the abnormal service step, the API corresponding to the first API identifier included in the i-th first processing log is located as the fault API; if not the abnormal service step, Let i=i+1, and perform the step of determining whether the abnormal service step is based on the execution result in the i-th first processing log.
本发明实施例所示的方案,日志系统根据业务流程模型中的执行顺序依次根据第一处理日志中的执行结果确定内部业务步骤是否为异常业务步骤,有利于按照执行业务步骤的先后顺序依次确定异常业务步骤,有利于避免资源浪费,提高对故障业务系统的定位效率的效果。According to the solution shown in the embodiment of the present invention, the log system determines whether the internal service step is an abnormal service step according to the execution result in the first processing log according to the execution order in the business process model, which is beneficial to sequentially determining according to the sequence of executing the business steps. Abnormal business steps are beneficial to avoid waste of resources and improve the efficiency of positioning the faulty business system.
结合第二方面的第四种可能的实现方式,在第六种可能的实现方式中,所述根据第二处理日志中的所述执行结果确定外部业务步骤是否为所述异常业务步骤,包括:根据第j个第二处理日志中的执行结果确定所述外部业务步骤是否为所述异常业务步骤,所述j为小于等于m的正整数;所述将所述第二API标识对应的API定位为所述故障API,包括:若是所述异常业务步骤,则将所述第j个第二处理日志中包括的第二API标识对应的API定位为所述故障API;若不是所述异常业务步骤,则令j=j+1,再次执行所述根据所述第j个第二处理日志中的执行结果确定所述外部业务步骤是否为所述异常业务步骤的步骤。With reference to the fourth possible implementation of the second aspect, in a sixth possible implementation, the determining, according to the execution result in the second processing log, whether the external service step is the abnormal service step includes: Determining, according to the execution result in the jth second processing log, whether the external service step is the abnormal service step, where j is a positive integer less than or equal to m; and the API corresponding to the second API identifier is located And the faulty API includes: if the abnormal service step, the API corresponding to the second API identifier included in the jth second processing log is located as the fault API; if not the abnormal service step Then, let j=j+1, perform the step of determining whether the external service step is the abnormal service step according to the execution result in the jth second processing log.
本发明实施例所示的方案,日志系统根据业务流程模型中的执行顺序依次根据第二处理日志中的执行结果确定外部业务步骤是否为异常业务步骤,有利于按照执行业务步骤的先后顺序依次确定异常业务步骤,有利于避免资源浪费,提高对故障业务系统的定位效率的效果。According to the solution shown in the embodiment of the present invention, the log system determines whether the external service step is an abnormal service step according to the execution result in the second processing log according to the execution order in the business process model, which is beneficial to sequentially determining according to the sequence of executing the business steps. Abnormal business steps are beneficial to avoid waste of resources and improve the efficiency of positioning the faulty business system.
第三方面,提供了故障定位装置,所述故障定位装置包括至少一个单元,该至少一个单元用于实现上述第二方面或第二方面中任意一种可能所提供的故障定位方法。In a third aspect, a fault locating device is provided, the fault locating device comprising at least one unit for implementing a fault locating method that may be provided by any of the second aspect or the second aspect described above.
上述本发明实施例第三方面所获得的技术效果与第二方面中对应的技术手段获得的技术效果近似,在这里不再赘述。The technical effects obtained by the foregoing third embodiment of the present invention are similar to those obtained by the corresponding technical means in the second aspect, and are not described herein again.
第四方面,提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于实现上述第二方面或第二方面中任意一种可能的设计所提供的故障定位方法的可执行程序。According to a fourth aspect, there is provided a computer readable storage medium having stored therein an executable program for implementing the fault location method provided by any of the possible aspects of the second aspect or the second aspect described above.
第五方面,提供一种日志系统,该日志系统包括处理器和存储器;所述处理器用于存储一个或一个以上的指令,所述指令被指示为由所述处理器执行,所述处理器用于实现上述第二方面或第二方面中任意一种可能的设计中所提供的故障定位方法。 In a fifth aspect, a logging system is provided, the logging system comprising a processor and a memory; the processor for storing one or more instructions, the instructions being indicated to be executed by the processor, the processor for A fault location method provided in any of the possible designs of the second aspect or the second aspect described above is implemented.
综上所述,本发明实施例提供的技术方案带来的有益效果包括:In summary, the beneficial effects provided by the technical solutions provided by the embodiments of the present invention include:
通过业务系统在执行与业务请求ID对应的业务步骤时,向日志系统发送对应的处理日志,日志系统根据接收到的处理日志中的执行结果确定异常业务步骤,最终定位出故障业务系统;解决了现有技术中需要通过从上之下依次排查各个业务系统,最终确定出故障业务系统,当业务系统的个数较多时,导致对故障业务系统的定位效率较低的问题,达到了提高对故障业务系统的定位效率的效果。When the service system performs the service step corresponding to the service request ID, the corresponding processing log is sent to the log system, and the log system determines the abnormal service step according to the execution result in the received processing log, and finally locates the faulty service system; In the prior art, it is necessary to check each service system in order from the top and bottom, and finally determine the faulty service system. When the number of service systems is large, the problem of low positioning efficiency of the faulty service system is achieved, and the fault is improved. The effect of the positioning efficiency of the business system.
附图说明DRAWINGS
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings to be used in the description of the embodiments will be briefly described below.
图1是现有技术中提供的主机备份业务的方法流程图;1 is a flowchart of a method for a host backup service provided in the prior art;
图2是本发明一个实施例提供的故障定位平台的结构示意图;2 is a schematic structural diagram of a fault location platform according to an embodiment of the present invention;
图3是本发明另一个实施例提供的故障定位平台的结构示意图;3 is a schematic structural diagram of a fault location platform according to another embodiment of the present invention;
图4是本发明一个实施例提供的主机备份业务故障定位的结构示意图;4 is a schematic structural diagram of fault location of a host backup service according to an embodiment of the present invention;
图5是本发明一个实施例提供的日志系统的结构示意图;FIG. 5 is a schematic structural diagram of a log system according to an embodiment of the present invention; FIG.
图6是本发明一个实施例提供的一种故障定位方法的方法流程图;FIG. 6 is a flowchart of a method for fault location method according to an embodiment of the present invention; FIG.
图7是本发明另一个实施例提供的一种故障定位方法的方法流程图;FIG. 7 is a flowchart of a method for fault location according to another embodiment of the present invention; FIG.
图8是本发明又一个实施例提供的一种故障定位方法的方法流程图;FIG. 8 is a flowchart of a method for fault location method according to still another embodiment of the present invention; FIG.
图9是本发明一个实施例提供的一种故障定位系统的结构示意图;9 is a schematic structural diagram of a fault location system according to an embodiment of the present invention;
图10是本发明一个实施例提供的一种故障定位方法的方法流程图;FIG. 10 is a flowchart of a method for fault location method according to an embodiment of the present invention; FIG.
图11是本发明一个实施例提供的故障定位装置的结构框图。FIG. 11 is a structural block diagram of a fault locating device according to an embodiment of the present invention.
具体实施方式detailed description
下面将结合附图,对本发明实施例中的技术方案进行清楚、完整地描述。The technical solutions in the embodiments of the present invention will be clearly and completely described in the following with reference to the accompanying drawings.
请参考图2,其示出了本发明一个实施例提供的故障定位平台的结构示意图。如图2所示,该平台可以包括:标识分配系统120、日志系统140、第一业务系统161和第二业务系统162。Please refer to FIG. 2 , which is a schematic structural diagram of a fault location platform according to an embodiment of the present invention. As shown in FIG. 2, the platform may include an identity distribution system 120, a log system 140, a first service system 161, and a second service system 162.
标识分配系统120具有为业务请求分配业务请求ID的能力。其中,业务请求是第一业务系统161执行业务时发送的,业务是由存在调用关系的第一业务系统161和第二业务系统162协作执行的业务。The identity assignment system 120 has the ability to assign a service request ID to a service request. The service request is sent by the first service system 161 when the service is executed, and the service is performed by the first service system 161 and the second service system 162 in which the call relationship exists.
可选的,本发明实施例中仅以执行业务的业务系统包括第一业务系统161和第二业务系统162为例进行举例说明,但不对执行业务的业务系统做具体限定,比如:执行业务的业务系统还包括:第三业务系统(图中未示出);其中,业务是由存在调用关系的第一业务系统161和第二业务系统162,以及存在调用关系的第一业务系统161和第三业务系统协作执行的业务。Optionally, in the embodiment of the present invention, only the service system that performs the service, including the first service system 161 and the second service system 162, is taken as an example, but the service system for performing the service is not specifically limited, for example, the service is performed. The service system further includes: a third service system (not shown); wherein the service is the first service system 161 and the second service system 162 in which the call relationship exists, and the first service system 161 and the first presence call relationship The business of three business systems collaboratively executed.
可选的,标识分配系统120还具有为业务分配业务ID的能力。一个业务ID对应一个业务请求ID,或者,一个业务ID对应若干个业务请求ID。Optionally, the identity distribution system 120 also has the ability to assign a service ID to the service. One service ID corresponds to one service request ID, or one service ID corresponds to several service request IDs.
可选的,在执行同一个业务时,在不同时间点触发业务请求时,标识分配系统120会为不同时间点触发的业务请求生成不同的业务请求ID。也就是说,每执行业务中的一个业务步骤,都会生成一个业务请求,标识分配系统120也会分配一个业务请求ID。Optionally, when the service is triggered at different time points when the same service is executed, the identifier distribution system 120 generates different service request IDs for the service requests triggered at different time points. That is to say, each business step in the execution of the business generates a service request, and the identity distribution system 120 also assigns a service request ID.
可选的,标识分配系统120中记录有业务ID、业务请求ID,以及业务ID和业务请求ID之间的对应关系。 Optionally, the identifier distribution system 120 records a service ID, a service request ID, and a correspondence between the service ID and the service request ID.
可选的,标识分配系统120将记录的业务ID、业务请求ID,以及业务ID和业务请求ID之间的对应关系同步给日志系统140。Optionally, the identity distribution system 120 synchronizes the recorded service ID, the service request ID, and the correspondence between the service ID and the service request ID to the log system 140.
第一业务系统161和第二业务系统162具有执行业务的能力,同时第一业务系统161还具有调用第二业务系统执行业务步骤的能力。业务请求ID对应的各个业务步骤包括:第一业务系统161执行的业务步骤和第一业务系统161调用第二业务系统162执行的业务步骤;在执行与业务请求ID对应的各个业务步骤时,第一业务系统161生成与业务请求ID对应的各个业务步骤的处理日志;可选的,业务请求ID对应有一个业务步骤,或者,业务请求ID对应有若干个业务步骤;业务请求ID对应的业务步骤中存在至少一个业务步骤是第一业务系统161调用第二业务系统162完成的步骤;每个业务步骤对应有一个处理日志。The first service system 161 and the second service system 162 have the ability to perform services while the first service system 161 also has the ability to invoke the second service system to perform business steps. Each service step corresponding to the service request ID includes: a service step performed by the first service system 161 and a service step performed by the first service system 162 by the first service system 162; when performing each service step corresponding to the service request ID, A service system 161 generates a processing log of each service step corresponding to the service request ID. Optionally, the service request ID corresponds to a service step, or the service request ID corresponds to a plurality of service steps; the service step corresponds to the service step The presence of at least one service step is a step performed by the first service system 161 to invoke the second service system 162; each service step corresponds to a processing log.
可选的,处理日志用于记录业务步骤的执行结果;可选的,执行结果包括:执行成功或执行失败。Optionally, the processing log is used to record the execution result of the business step; optionally, the execution result includes: the execution succeeds or the execution fails.
可选的,第一业务系统161将生成的与业务请求ID对应的处理日志发送给日志系统140。第一业务系统161通过异步发送的方式将与业务请求ID对应的处理日志发送给日志系统140,或者,第一业务系统161将生成的与业务请求ID对应的处理日志集中一起上报给日志系统140。Optionally, the first service system 161 sends the generated processing log corresponding to the service request ID to the log system 140. The first service system 161 sends the processing log corresponding to the service request ID to the log system 140 by means of asynchronous transmission, or the first service system 161 reports the generated processing log corresponding to the service request ID to the log system 140. .
日志系统140具有分析处理日志的能力。日志系统140接收第一业务系统161发送的与业务请求ID对应的处理日志,根据处理日志中的执行结果确定异常业务步骤,并将执行该异常业务步骤的业务系统140定位为故障业务系统。 Logging system 140 has the ability to analyze processing logs. The log system 140 receives the processing log corresponding to the service request ID sent by the first service system 161, determines an abnormal service step according to the execution result in the processing log, and locates the service system 140 that executes the abnormal service step as the faulty service system.
可选的,异常业务步骤包括执行失败;日志系统140在检测到处理日志中的执行结果为执行失败时,确定执行该业务步骤的业务系统140为故障业务系统。Optionally, the abnormal service step includes an execution failure. When the log system 140 detects that the execution result in the processing log is an execution failure, the log system 140 determines that the service system 140 that performs the service step is a faulty service system.
综上所述,本实施例提供的故障定位平台,通过第一业务系统和第一业务系统调用第二业务系统在执行与业务请求ID对应的业务步骤时,第一业务系统向日志系统发送对应的处理日志,日志系统根据接收到的处理日志中的执行结果确定异常业务步骤,最终定位出故障业务系统;由于第一业务系统为每个业务步骤都生成一个处理日志,使得日志系统根据处理日志中的执行结果可以确定出具体的故障业务系统,解决了现有技术中需要通过从上之下依次排查各个业务系统,最终确定出故障业务系统,当业务系统的个数较多时,导致对故障业务系统的定位效率较低的问题,达到了通过与业务请求ID对应的处理日志定位故障业务系统,提高了对故障业务系统的定位效率的效果。In summary, the fault location platform provided in this embodiment sends a corresponding service system to the log system when the first service system and the first service system call the second service system to perform the service step corresponding to the service request ID. The processing log, the log system determines the abnormal service step according to the execution result in the received processing log, and finally locates the faulty service system; since the first service system generates a processing log for each service step, the log system processes the log according to the processing log. The execution result in the system can determine a specific faulty service system, and solves the problem in the prior art that the faulty service system is determined by sequentially checking each service system from above and below, and when the number of service systems is large, the fault is caused. The problem of low positioning efficiency of the service system achieves the effect of locating the faulty service system through the processing log corresponding to the service request ID, thereby improving the positioning efficiency of the faulty service system.
基于图2所示的故障定位平台中,可选的,第一业务系统161上报的处理日志包括:第一处理日志和第二处理日志,如图3所示。In the fault location platform shown in FIG. 2, the processing log reported by the first service system 161 includes: a first processing log and a second processing log, as shown in FIG. 3 .
第一业务系统161可以通过自身独立执行与业务请求ID对应的内部业务步骤,在执行内部业务步骤时,生成与内部业务步骤对应的第一处理日志。第一处理日志记录第一业务系统161执行内部业务步骤的执行结果。The first service system 161 can independently execute the internal service step corresponding to the service request ID by itself, and generate the first process log corresponding to the internal service step when the internal service step is executed. The first processing log records the execution results of the first business system 161 performing the internal business steps.
可选的,第一业务系统161包括具有第一API的第一处理模块,第一API具有对应的第一API标识;第一业务系统161通过第一处理模块执行与业务请求ID对应的内部业务步骤;第一处理日志包括:业务请求ID、第一业务系统ID、第一API标识和结果码,其中结果码是第一处理模块执行内部业务步骤的执行结果。Optionally, the first service system 161 includes a first processing module having a first API, where the first API has a corresponding first API identifier, and the first service system 161 performs an internal service corresponding to the service request ID by using the first processing module. The first processing log includes: a service request ID, a first service system ID, a first API identifier, and a result code, where the result code is an execution result of the first processing module executing the internal service step.
可选的,当业务步骤执行失败时,第一处理日志中携带有错误的结果码,或者,第 一处理日志中不携带结果码,或者,第一处理日志中不携带有结果码,且携带有网络连接异常或无响应等。Optionally, when the service step fails to be executed, the first processing log carries an incorrect result code, or, The processing log does not carry the result code, or the first processing log does not carry the result code, and carries the network connection abnormality or no response.
可选的,第一业务系统161将生成的第一处理日志发送给日志系统140。Optionally, the first service system 161 sends the generated first processing log to the log system 140.
第二业务系统162是第一业务系统161在执行与业务请求ID对应的外部业务步骤时调用的业务系统。The second service system 162 is a service system that is invoked when the first service system 161 executes an external service step corresponding to the service request ID.
可选的,第一业务系统161在调用第二业务系统162执行与业务请求ID对应的外部业务步骤时,生成与外部业务步骤对应的第二处理日志。第二处理日志记录调用第二业务系统162执行外部业务步骤的执行结果。Optionally, the first service system 161 generates a second processing log corresponding to the external service step when the second service system 162 is invoked to execute the external service step corresponding to the service request ID. The second processing log records the execution result of the second business system 162 executing the external business step.
可选的,第二业务系统162包括具有第二API的第二处理模块,第二API具有对应的第二API标识;第一业务系统161通过第一处理模块调用第二处理模块执行与业务请求ID对应的外部业务步骤;第二处理日志包括:业务请求ID、第一业务系统ID、第一API标识、第二业务系统ID和第二API标识和返回码,其中返回码是第二处理模块执行外部业务步骤的执行结果。Optionally, the second service system 162 includes a second processing module having a second API, where the second API has a corresponding second API identifier, and the first service system 161 invokes the second processing module to execute the service request by using the first processing module. The external processing step corresponding to the ID; the second processing log includes: a service request ID, a first service system ID, a first API identifier, a second service system ID, and a second API identifier and a return code, where the return code is a second processing module Execute the execution result of the external business step.
可选的,第一业务系统161将生成的第二处理日志发送给日志系统140。Optionally, the first service system 161 sends the generated second processing log to the log system 140.
需要补充说明的是,本实施例中仅以第一业务系统161执行业务时发送的业务请求为例进行举例说明,但不对此做具体限定,比如:由第二业务系统162在执行业务时发送的业务请求为例,则在执行业务的过程中第二业务系统162也可以独立执行业务请求ID对应的内部业务步骤,并生成对应的第一处理日志,发送给日志系统140,或者第二业务系统调用其他的业务系统执行业务请求ID对应的外部业务步骤,并生成对应的第二处理日志,发送给日志系统140。It should be noted that, in this embodiment, only the service request sent by the first service system 161 when the service is executed is taken as an example, but is not specifically limited. For example, when the second service system 162 performs the service, For example, in the process of executing the service, the second service system 162 can also independently execute the internal service step corresponding to the service request ID, and generate a corresponding first processing log, and send it to the log system 140, or the second service. The system calls the other service system to execute the external service step corresponding to the service request ID, and generates a corresponding second processing log, which is sent to the log system 140.
日志系统140根据第一处理日志中的执行结果确定内部业务步骤是否为异常业务步骤,在内部业务步骤是异常业务步骤时,将第一业务系统161定位为故障业务系统;日志系统140还根据第二处理日志中的执行结果确定外部业务步骤是否为异常业务步骤,在外部业务步骤是异常业务步骤时,将被调用的第二业务系统162定位为故障业务系统。The log system 140 determines whether the internal service step is an abnormal service step according to the execution result in the first processing log. When the internal service step is an abnormal service step, the first service system 161 is positioned as a faulty service system; the log system 140 is further configured according to the The execution result in the second processing log determines whether the external service step is an abnormal service step. When the external service step is an abnormal service step, the called second service system 162 is positioned as the faulty service system.
可选的,日志系统140在将第一业务系统161定位为故障业务系统后,根据第一处理日志中携带的第一API标识,确定第一API标识对应的API为故障API;日志系统还在将第二业务系统162定位为故障业务系统后,根据第二处理日志中携带的第二API标识,确定第二API标识对应的API为故障API。Optionally, after the first service system 161 is configured as the faulty service system, the log system 140 determines, according to the first API identifier carried in the first processing log, that the API corresponding to the first API identifier is a fault API; the log system is still After the second service system 162 is located as the faulty service system, the API corresponding to the second API identifier is determined to be a fault API according to the second API identifier carried in the second processing log.
可选的,日志系统140还获取与业务请求ID对应的业务流程模型。业务流程模型中包括:与业务请求ID对应的各个业务步骤的执行顺序。Optionally, the log system 140 also obtains a business process model corresponding to the service request ID. The business process model includes: an execution sequence of each business step corresponding to the service request ID.
日志系统140根据业务流程模型中的执行顺序依次获取与各个业务步骤对应的n个第一处理日志和m个第二处理日志,n和m分别为正整数。The log system 140 sequentially acquires n first processing logs and m second processing logs corresponding to the respective service steps according to the execution order in the business process model, where n and m are positive integers, respectively.
可选的,日志系统140根据第i个第一处理日志中的执行结果确定内部业务步骤是否异常业务步骤,i为小于等于n的正整数;在内部业务步骤是异常业务步骤时,日志系统140将第i个第一处理日志中携带的第一API标识对应的API确定为故障API;若内部业务步骤不是异常业务步骤,则日志系统140令i=i+1,继续根据第i个第一处理日志中的执行结果确定内部业务步骤是否异常业务步骤,直至确定出异常业务步骤为止。Optionally, the log system 140 determines, according to the execution result in the i th first processing log, whether the internal service step is an abnormal service step, where i is a positive integer less than or equal to n; when the internal service step is an abnormal service step, the log system 140 The API corresponding to the first API identifier carried in the ith first processing log is determined as a fault API; if the internal service step is not an abnormal service step, the log system 140 causes i=i+1 to continue according to the ith first The execution result in the processing log determines whether the internal business step is an abnormal business step until an abnormal business step is determined.
可选的,日志系统140根据第j个第二处理日志中的执行结果确定外部业务步骤是否异常业务步骤,j为小于等于m的正整数;在外部业务步骤是异常业务步骤时,日志系统140将第j个第二处理日志中携带的第二API标识对应的API确定为故障API;若 外部业务步骤不是异常业务步骤,则日志系统140令j=j+1,继续根据第j个第二处理日志中的执行结果确定外部业务步骤是否异常业务步骤,直至确定出异常业务步骤为止。Optionally, the log system 140 determines, according to the execution result in the jth second processing log, whether the external service step is an abnormal service step, where j is a positive integer equal to or equal to m; when the external service step is an abnormal service step, the log system 140 Determining, by the API corresponding to the second API identifier carried in the jth second processing log, a fault API; If the external service step is not an abnormal service step, the log system 140 makes j=j+1, and continues to determine whether the external service step is abnormal according to the execution result in the jth second processing log until an abnormal service step is determined.
可选的,日志系统140包括:分析组件141、建模组件142、ID处理组件143和日志组件144;Optionally, the log system 140 includes: an analysis component 141, a modeling component 142, an ID processing component 143, and a log component 144;
ID处理组件143,用于存储业务请求ID;An ID processing component 143, configured to store a service request ID;
日志组件144,用于存储与业务请求ID对应的处理日志;a log component 144, configured to store a processing log corresponding to the service request ID;
建模组件142,用于存储与业务请求ID对应的业务流程模型;a modeling component 142, configured to store a business process model corresponding to the service request ID;
分析组件141,用于根据业务流程模型和处理日志中的执行结果确定异常业务步骤,将用于执行异常业务步骤的业务系统定位为故障业务系统。The analysis component 141 is configured to determine an abnormal service step according to the execution result in the business process model and the processing log, and locate the business system for performing the abnormal business step as the faulty service system.
在一个示例性的例子中,如图4所示,以图1所示的主机备份业务为例,执行业务的业务系统包括:云管理系统11、数据保护服务系统12、虚拟化系统13、云备份管理系统14、生产存储系统15和备份存储系统16;在完成主机备份业务时,云管理系统11调用数据保护服务系统12执行备份请求,标识分配系统为备份请求分配业务请求ID,将业务请求ID反馈给云管理系统11,同时也同步给日志系统140中的ID处理组件143;在调用数据保护服务系统12执行备份请求时,云管理系统11生成对应的第二处理日志,并将生成的第二处理日志发送给日志系统140中的日志组件144;数据保护服务系统12调用虚拟化系统13执行调度备份请求时;数据保护服务系统12生成对应的第二处理日志,并将生成的第二处理日志发送给日志系统140中的日志组件144;虚拟化系统13调用云备份管理系统14依次执行卷快照、卷快照对比、提取数据、存放数据和备份完成5个步骤时;虚拟化系统13生成对应的第二处理日志,并将生成的第二处理日志发送给日志系统140中的日志组件144;云备份管理系统14在独立执行卷快照、卷快照对比、提取数据、存放数据和备份完成5个步骤时,根据每个步骤生成对应的第一处理日志,将生成的5个第一处理日志发送给日志系统140中的日志组件144;云备份管理系统14调用生产存储系统15存储卷快照对比的结果和经过提取后得到的差异数据时,云备份管理系统14生成对应的第二处理日志,并将生成的第二处理日志发送给日志系统140中的日志组件144;云备份管理系统14在调用备份存储系统16存储当前时刻的数据时,云备份管理系统14生成对应的第二处理日志,并将生成的第二处理日志发送给日志系统140中的日志组件144;日志系统140中的建模组件142中预先存储有与主机备份请求ID对应的业务流程模型;在主机备份业务失败时,日志系统140中的分析组件141,根据业务流程模型和日志组件144中存储的处理日志的执行结果确定异常业务步骤,将用于执行异常业务步骤的业务系统定位为故障业务系统。比如:根据虚拟化系统13上报的第二处理日志中的执行结果确定该业务步骤为异常业务步骤,则分析组件141确定云备份管理系统14为故障业务系统。In an exemplary example, as shown in FIG. 4, taking the host backup service shown in FIG. 1 as an example, the service system for executing the service includes: the cloud management system 11, the data protection service system 12, the virtualization system 13, and the cloud. The backup management system 14, the production storage system 15 and the backup storage system 16; when the host backup service is completed, the cloud management system 11 invokes the data protection service system 12 to perform a backup request, and the identification distribution system allocates a service request ID for the backup request, and requests the service The ID is fed back to the cloud management system 11, and is also synchronized to the ID processing component 143 in the log system 140. When the data protection service system 12 is invoked to execute the backup request, the cloud management system 11 generates a corresponding second processing log, and generates the generated The second processing log is sent to the log component 144 in the log system 140; when the data protection service system 12 invokes the virtualization system 13 to execute the scheduled backup request; the data protection service system 12 generates a corresponding second processing log, and generates the second The processing log is sent to the log component 144 in the log system 140; the virtualization system 13 calls the cloud backup management system 14 in turn. When the volume snapshot, volume snapshot comparison, data extraction, data storage, and backup are completed, the virtualization system 13 generates a corresponding second processing log, and sends the generated second processing log to the log component in the log system 140. 144; the cloud backup management system 14 generates a corresponding first processing log according to each step in the five steps of independently performing volume snapshot, volume snapshot comparison, extracting data, storing data, and backing up, and generating the first five processings. The log is sent to the log component 144 in the log system 140; when the cloud backup management system 14 calls the production storage system 15 to store the result of the volume snapshot comparison and the difference data obtained after the extraction, the cloud backup management system 14 generates a corresponding second processing log. And generating the second processing log to the log component 144 in the log system 140; when the cloud backup management system 14 calls the backup storage system 16 to store the current time data, the cloud backup management system 14 generates a corresponding second processing log. And generating the generated second processing log to the log component 144 in the log system 140; in the log system 140 The business process model corresponding to the host backup request ID is pre-stored in the modeling component 142; when the host backup service fails, the analysis component 141 in the log system 140 executes according to the business process model and the processing log stored in the log component 144. As a result, an abnormal business step is determined, and the business system for performing the abnormal business step is positioned as a faulty business system. For example, if the service step is determined to be an abnormal service step according to the execution result in the second processing log reported by the virtualization system 13, the analysis component 141 determines that the cloud backup management system 14 is a faulty service system.
请参考图5,其示出了本发明一个实施例提供的日志系统140的结构示意图,该日志系统140可以包括:处理器511、通信总线512、存储器513以及通信接口514。Please refer to FIG. 5 , which is a schematic structural diagram of a log system 140 according to an embodiment of the present invention. The log system 140 may include a processor 511 , a communication bus 512 , a memory 513 , and a communication interface 514 .
处理器511可以包括一个或者一个以上中央处理单元(英文:Central Processing Unit,缩写:CPU)。处理器511通过运行软件程序以及模块,从而执行各种功能应用以及业务数据处理。 The processor 511 may include one or more central processing units (English: Central Processing Unit, abbreviated: CPU). The processor 511 executes various functional applications and business data processing by running software programs and modules.
通信接口514可以包含无线网络接口,比如以太网接口,也可以包含有线网络接口。该通信接口514用于接收业务系统发送的处理日志和标识分配系统发送的业务请求ID。 Communication interface 514 may include a wireless network interface, such as an Ethernet interface, or a wired network interface. The communication interface 514 is configured to receive a processing log sent by the service system and a service request ID sent by the identity distribution system.
存储器513和通信接口514分别通过通信总线512与处理器511相连。 Memory 513 and communication interface 514 are coupled to processor 511 via communication bus 512, respectively.
存储器513可用于存储软件程序以及模块,该软件程序以及模块由处理器511执行。此外,该存储器513中还可以存储各类业务数据和用户数据。The memory 513 can be used to store software programs and modules that are executed by the processor 511. In addition, various types of service data and user data can also be stored in the memory 513.
在本发明实施例中,存储器513可存储操作系统51以及至少一个功能所需的程序指令52。程序指令52可以包括接收模块521、确定模块522和定位模块523和获取模块524等。In an embodiment of the invention, the memory 513 can store the operating system 51 and program instructions 52 required for at least one function. The program instructions 52 may include a receiving module 521, a determining module 522 and a positioning module 523, an obtaining module 524, and the like.
接收模块521,用于接收与业务请求标识ID对应的处理日志。The receiving module 521 is configured to receive a processing log corresponding to the service request identifier ID.
确定模块522,用于根据处理日志中的执行结果确定异常业务步骤。The determining module 522 is configured to determine an abnormal service step according to the execution result in the processing log.
定位模块523,用于将用于执行异常业务步骤的业务系统定位为故障业务系统。The positioning module 523 is configured to locate a service system for performing an abnormal service step as a faulty service system.
获取模块524,用于获取与业务请求ID对应的业务流程模型。The obtaining module 524 is configured to obtain a business process model corresponding to the service request ID.
存储器513可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(static random access memory,SRAM),动态随机存取存储器(dynamic random access memory,DRAM),电可擦除可编程只读存储器(electrically erasable programmable read-only memory,EEPROM),可擦除可编程只读存储器(erasable programmable read-only memory,EPROM),可编程只读存储器(programmable read-only memory,PROM),只读存储器(read-only memory,ROM),磁存储器,快闪存储器,磁盘或光盘。The memory 513 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), dynamic random access memory (DRAM). ), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (programmable read) -only memory, PROM), read-only memory (ROM), magnetic memory, flash memory, disk or optical disk.
本领域技术人员可以理解,图5中所示出的该日志系统140结构并不构成对该日志系统140的限定,本发明中的日志系统140可以包括比图示更多或更少的部件或组合某些部件,或者不同的部件布置。It will be understood by those skilled in the art that the structure of the log system 140 shown in FIG. 5 does not constitute a limitation to the log system 140. The log system 140 in the present invention may include more or fewer components or Combine some components, or different component arrangements.
请参考图6,其示出了本发明一个实施例提供的故障定位方法的方法流程图。本实施例以该故障定位方法应用于图2所示的日志系统140中来举例说明。该故障定位方法包括以下步骤:Please refer to FIG. 6, which is a flowchart of a method for providing a fault location method according to an embodiment of the present invention. This embodiment is exemplified by applying the fault location method to the log system 140 shown in FIG. 2. The fault location method includes the following steps:
步骤601,日志系统接收与业务请求ID对应的处理日志。Step 601: The log system receives a processing log corresponding to the service request ID.
业务请求是第一业务系统执行业务时发送的,业务是由存在调用关系的第一业务系统和第二业务系统协作执行的业务,处理日志用于记录与业务请求ID对应的各个业务步骤的执行结果,各个业务步骤包括:第一业务系统执行的业务步骤,和,第一业务系统调用第二业务系统执行的业务步骤。The service request is sent when the first service system executes the service, and the service is performed by the first service system and the second service system in which the call relationship exists, and the processing log is used to record the execution of each service step corresponding to the service request ID. As a result, each service step includes: a service step performed by the first service system, and a service step performed by the first service system to invoke the second service system.
可选的,本发明实施例中仅以执行业务的业务系统包括第一业务系统和第二业务系统为例进行举例说明,但不对执行业务的业务系统做具体限定,比如:执行业务的业务系统还包括:第三业务系统;其中,业务是由存在调用关系的第一业务系统和第二业务系统,以及存在调用关系的第一业务系统161和第三业务系统协作执行的业务。对应地,业务请求ID对应的各个业务步骤也仅以第一业务系统执行的业务步骤和第一业务系统调用第二业务系统执行的业务步骤为例进行举例说明,本发明实施例对此并不做具体限定,比如:各个业务步骤还可以包括:第一业务系统调用第三业务系统执行的业务步骤。Optionally, in the embodiment of the present invention, only the service system that performs the service, including the first service system and the second service system, is taken as an example, but the service system for executing the service is not specifically limited, for example, the service system for executing the service. The method further includes: a third service system; wherein the service is a service executed by the first service system and the second service system in which the call relationship exists, and the first service system 161 and the third service system in which the call relationship exists. Correspondingly, the service steps corresponding to the service request ID are only exemplified by the service steps performed by the first service system and the service steps performed by the first service system. Specifically, for example, each service step may further include: the first service system invokes a service step performed by the third service system.
第一业务系统生成与业务请求ID对应的各个业务步骤的处理日志。第一业务系统在执行与业务请求ID对应的业务步骤,以及第一业务系统调用第二业务系统执行与业务请 求ID对应的业务步骤时,第一业务系统生成与各个业务步骤对应的处理日志,可选的,每一个业务步骤对应一个处理日志。The first service system generates a processing log of each service step corresponding to the service request ID. The first service system executes the service step corresponding to the service request ID, and the first service system invokes the second service system execution and service request When the service step corresponding to the ID is performed, the first service system generates a processing log corresponding to each service step. Optionally, each service step corresponds to one processing log.
比如:业务系统A在执行与业务请求B对应的业务步骤时需要完成业务步骤1、业务步骤2和业务步骤3,业务步骤1需要调用业务系统C执行;则在执行业务步骤1时,业务系统A生成与业务步骤1对应的处理日志1;在执行业务步骤2时,业务系统A生成与业务步骤2对应的处理日志2;在执行业务步骤3时,业务系统A生成与业务步骤3对应的处理日志3。For example, the service system A needs to complete the service step 1, the service step 2, and the service step 3 when performing the service step corresponding to the service request B, and the service step 1 needs to invoke the service system C to execute; when the service step 1 is executed, the service system A generates the processing log 1 corresponding to the service step 1; when the service step 2 is executed, the service system A generates the processing log 2 corresponding to the service step 2; when the service step 3 is executed, the service system A generates the corresponding corresponding to the service step 3. Process log 3.
第一业务系统将生成的处理日志发送给日志系统;可选的,第一业务系统通过异步发送的方式分别将处理日志发送给日志系统;或者,第一业务系统将生成的处理日志一起发送给日志系统;比如:第一业务系统在生成处理日志1时将处理日志1发送给日志系统;在生成处理日志2时将处理日志2发送给日志系统;在生成处理日志3时将处理日志3发送给日志系统;或者,第一业务系统在生成处理日志1、处理日志2和处理日志3后,将处理日志1、处理日志2和处理日志3一起发送给日志系统。The first service system sends the generated processing log to the log system. Optionally, the first service system sends the processing log to the log system by means of asynchronous sending. Alternatively, the first service system sends the generated processing log to the log system. The log system; for example, the first service system sends the processing log 1 to the log system when the processing log 1 is generated; the processing log 2 is sent to the log system when the processing log 2 is generated; and the processing log 3 is sent when the processing log 3 is generated. To the log system; or, after the first service system generates the processing log 1, the processing log 2, and the processing log 3, the processing log 1, the processing log 2, and the processing log 3 are sent to the log system together.
对应的,日志系统接收第一业务系统发送的与业务请求ID对应的处理日志。Correspondingly, the log system receives a processing log corresponding to the service request ID sent by the first service system.
该步骤可以由图5所示的日志系统140中的处理器511执行接收模块521来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the receiving module 521.
步骤602,日志系统根据处理日志中的执行结果确定异常业务步骤。Step 602: The log system determines an abnormal service step according to the execution result in the processing log.
日志系统在接收到处理日志后,根据处理日志中的执行结果确定异常业务步骤。After receiving the processing log, the log system determines the abnormal service step according to the execution result in the processing log.
可选的,处理日志中的执行结果包括:执行成功或执行失败;异常业务步骤为执行结果为执行失败对应的业务步骤。当处理日志中的执行结果为执行失败时,日志系统确定该处理日志对应的业务步骤为异常业务步骤。Optionally, the execution result in the processing log includes: the execution succeeds or the execution fails; the abnormal business step is the business step corresponding to the execution failure being the execution failure. When the execution result in the processing log is an execution failure, the log system determines that the service step corresponding to the processing log is an abnormal service step.
比如:处理日志1、处理日志2和处理日志3中处理日志2中的执行结果为执行失败,则日志系统根据处理日志2中的执行结果确定业务步骤2为异常业务步骤。For example, if the execution result in the processing log 2 in the processing log 1, the processing log 2, and the processing log 3 is an execution failure, the log system determines that the service step 2 is an abnormal service step according to the execution result in the processing log 2.
该步骤可以由图5所示的日志系统140中的处理器511执行确定模块522来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the determination module 522.
步骤603,将用于执行异常业务步骤的业务系统定位为故障业务系统。Step 603: Position the service system for performing the abnormal service step as a faulty service system.
日志系统再确定出异常业务步骤后,将用于执行该异常业务步骤的业务系统定位为故障业务系统。比如:日志系统确定业务步骤2为异常业务步骤,则日志系统将执行业务步骤2的业务系统A确定为故障业务系统。After the log system determines the abnormal service step, the service system for performing the abnormal service step is located as the faulty service system. For example, if the log system determines that the service step 2 is an abnormal service step, the log system determines the service system A that performs the service step 2 as the faulty service system.
该步骤可以由图5所示的日志系统140中的处理器511执行定位模块523来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the positioning module 523.
综上所述,本实施例提供的故障定位方法,通过第一业务系统和第一业务系统调用第二业务系统在执行与业务请求ID对应的业务步骤时,第一业务系统向日志系统发送对应的处理日志,日志系统根据接收到的处理日志中的执行结果确定异常业务步骤,最终定位出故障业务系统;由于第一业务系统为每个业务步骤都生成一个处理日志,使得日志系统根据处理日志中的执行结果可以确定出具体的故障业务系统,解决了现有技术中需要通过从上之下依次排查各个业务系统,最终确定出故障业务系统,当业务系统的个数较多时,导致对故障业务系统的定位效率较低的问题,达到了通过与业务请求ID对应的处理日志定位故障业务系统,提高了对故障业务系统的定位效率的效果。In summary, the fault location method provided in this embodiment, when the first service system and the first service system invoke the second service system to perform the service step corresponding to the service request ID, the first service system sends a corresponding response to the log system. The processing log, the log system determines the abnormal service step according to the execution result in the received processing log, and finally locates the faulty service system; since the first service system generates a processing log for each service step, the log system processes the log according to the processing log. The execution result in the system can determine a specific faulty service system, and solves the problem in the prior art that the faulty service system is determined by sequentially checking each service system from above and below, and when the number of service systems is large, the fault is caused. The problem of low positioning efficiency of the service system achieves the effect of locating the faulty service system through the processing log corresponding to the service request ID, thereby improving the positioning efficiency of the faulty service system.
基于图6所示的实施例中,可选的,第一业务系统可以独立执行与业务请求ID对应的内部业务步骤;处理日志为第一处理日志,第一处理日志用于记录第一业务系统执行与业务请求ID对应的内部业务步骤的执行结果。则作为一种可能的实现方式,步骤602 至步骤603可以替换实现为如下步骤701至步骤705,如图7所示:In the embodiment shown in FIG. 6 , optionally, the first service system may independently perform an internal service step corresponding to the service request ID; the processing log is a first processing log, and the first processing log is used to record the first service system. Execute the execution result of the internal business step corresponding to the service request ID. As a possible implementation, step 602 Step 603 can be replaced by the following steps 701 to 705, as shown in FIG. 7:
步骤701,日志系统获取与业务请求ID对应的业务流程模型,业务流程模型包括:与业务请求ID对应的各个业务步骤的执行顺序。Step 701: The log system acquires a business process model corresponding to the service request ID, where the business process model includes: an execution sequence of each service step corresponding to the service request ID.
在业务请求执行失败时,日志系统获取与业务请求ID对应的业务流程模型。When the service request fails to execute, the log system acquires a business process model corresponding to the service request ID.
第一业务系统在独立执行与业务请求ID对应的内部业务步骤时,需要按照预定的执行顺序执行整个业务请求ID对应的内部业务步骤,比如:业务系统A执行业务请求71时,总共需要执行4个业务步骤,分别为业务步骤1、业务步骤2、业务步骤3和业务步骤4;先通过B模块执行业务步骤1和业务步骤2;再通过C模块执行业务步骤3和业务步骤4。则示例性的,业务系统A执行业务请求71对应的业务流程模型如下表一所示:When the first service system independently executes the internal service step corresponding to the service request ID, the internal service step corresponding to the entire service request ID needs to be executed according to a predetermined execution order. For example, when the service system A executes the service request 71, a total of 4 needs to be executed. The service steps are respectively the service step 1, the service step 2, the service step 3, and the service step 4. The service step 1 and the service step 2 are performed through the B module; and the service step 3 and the service step 4 are performed through the C module. Exemplarily, the business process model corresponding to the business system A executing the business request 71 is as shown in the following Table 1:
业务请求IDBusiness request ID 业务系统business system 业务系统的模块Business system module 执行顺序Execution order
业务请求71Business request 71 业务系统ABusiness System A B模块B module 11
业务请求71Business request 71 业务系统ABusiness System A C模块C module 22
表一Table I
该步骤可以由图5所示的日志系统140中的处理器511执行获取模块524来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the acquisition module 524.
步骤702,日志系统根据执行顺序依次获取与各个业务步骤对应的n个第一处理日志,n为正整数。Step 702: The log system sequentially acquires n first processing logs corresponding to the respective service steps according to the execution order, where n is a positive integer.
日志系统在获取到业务流程模型后,根据业务流程模型中的执行顺序从接收到的处理日志中,获取与各个业务步骤对应的n个第一处理日志。After obtaining the business process model, the log system obtains n first processing logs corresponding to the respective service steps from the received processing logs according to the execution order in the business process model.
可选的,第一处理日志是第一业务系统在独立执行与业务请求ID对应的内部业务步骤时对应的执行结果。Optionally, the first processing log is an execution result corresponding to the first service system independently executing an internal service step corresponding to the service request ID.
该步骤可以由图5所示的日志系统140中的处理器511执行获取模块524来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the acquisition module 524.
步骤703,日志系统根据第一处理日志中的执行结果确定内部业务步骤是否为异常业务步骤。Step 703: The log system determines, according to the execution result in the first processing log, whether the internal service step is an abnormal service step.
可选的,第一处理日志中的执行结果包括:执行成功或执行失败;异常业务步骤为执行结果为执行失败对应的业务步骤。当第一处理日志中的执行结果为执行失败时,日志系统确定该第一处理日志对应的内部业务步骤为异常业务步骤。Optionally, the execution result in the first processing log includes: the execution succeeds or the execution fails; the abnormal business step is a business step corresponding to the execution result being the execution failure. When the execution result in the first processing log is an execution failure, the log system determines that the internal service step corresponding to the first processing log is an abnormal service step.
比如:第一处理日志1、第一处理日志2和第一处理日志3中第一处理日志2中的执行结果为执行失败,则日志系统根据第一处理日志2中的执行结果确定内部业务步骤2为异常业务步骤。For example, the execution result in the first processing log 2, the first processing log 2, and the first processing log 3 in the first processing log 2 is an execution failure, and the log system determines the internal service step according to the execution result in the first processing log 2. 2 is an abnormal business step.
可选的,本步骤可以通过如下可能的实现方式实现:Optionally, this step can be implemented by the following possible implementation manners:
第一步,根据第i个第一处理日志中的执行结果确定内部业务步骤是否为异常业务步骤,i为小于等于n的正整数。The first step is to determine whether the internal service step is an abnormal service step according to the execution result in the i-th first processing log, where i is a positive integer less than or equal to n.
可选的,i的初始值为1。日志系统从第1个第一处理日志开始,根据第1个第一处理日志的执行结果确定对应的内部业务步骤是否为异常业务步骤。Optionally, the initial value of i is 1. The log system starts from the first first processing log, and determines whether the corresponding internal service step is an abnormal service step according to the execution result of the first first processing log.
第二步,若不是异常业务步骤,则令i=i+1,继续根据第i个第一处理日志中的执行结果确定对应的内部业务步骤是否为异常业务步骤。In the second step, if it is not an abnormal service step, let i=i+1, and continue to determine whether the corresponding internal service step is an abnormal service step according to the execution result in the i-th first processing log.
通过上述两个步骤的循环,直至确定到异常业务步骤为止,否则对n个第一处理日志中的执行结果依次进行确定。 Through the loop of the above two steps, until the abnormal business step is determined, otherwise the execution results in the n first processing logs are sequentially determined.
该步骤可以由图5所示的日志系统140中的处理器511执行确定模块522来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the determination module 522.
步骤704,在内部业务步骤是异常业务步骤时,日志系统将第一业务系统定位为故障业务系统。Step 704: When the internal service step is an abnormal service step, the log system locates the first service system as a faulty service system.
当日志系统确定第i个第一处理日志中对应的内部业务步骤是异常业务步骤时,则日志系统将执行该内部业务步骤的第一业务系统定位为故障业务系统。比如:日志系统确定第2个第一处理日志中对应的内部业务步骤2为异常业务步骤,则日志系统将执行内部业务步骤2的第一业务系统A确定为故障业务系统。When the log system determines that the corresponding internal service step in the i-th first processing log is an abnormal service step, the log system locates the first service system that performs the internal service step as the faulty service system. For example, the log system determines that the corresponding internal service step 2 in the second first processing log is an abnormal service step, and the log system determines that the first service system A in the internal service step 2 is determined as the faulty service system.
可选的,第一业务系统包括:具有第一API的第一处理模块,第一API具有对应的第一API标识;第一处理日志包括:业务请求ID、第一业务系统ID、第一API标识和结果码,结果码是指第一处理模块执行内部业务步骤的执行结果。Optionally, the first service system includes: a first processing module having a first API, where the first API has a corresponding first API identifier; and the first processing log includes: a service request ID, a first service system ID, and a first API. The identification and result code, the result code refers to the execution result of the internal processing step performed by the first processing module.
该步骤可以由图5所示的日志系统140中的处理器511执行定位模块523来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the positioning module 523.
步骤705,在故障业务系统为第一业务系统时,日志系统根据第一处理日志中包含的第一API标识,将第一API标识对应的API定位为故障API。Step 705: When the faulty service system is the first service system, the log system locates the API corresponding to the first API identifier as a fault API according to the first API identifier included in the first processing log.
日志系统在根据第一处理日志中携带的结果码确定出第一业务系统为故障业务系统后,将第一处理日志中包含的第一API标识对应的API确定为故障API。The log system determines, according to the result code carried in the first processing log, that the first service system is a faulty service system, and determines an API corresponding to the first API identifier included in the first processing log as a fault API.
比如:第一业务系统B包括第一处理模块a和第一处理模块b;第一处理模块a的第一API标识为API11,第一处理模块b的第一API标识为API12;第一业务系统B执行业务请求72时,总共需要执行2个内部业务步骤,分别为内部业务步骤1和内部业务步骤2;先通过第一处理模块a执行内部业务步骤1;再通过第一处理模块b执行内部业务步骤2;当日志系统根据第一处理日志中的结果码确定第一业务系统B为故障业务系统时,根据第一处理日志中携带的API12确定API12对应的API为故障API。可选的,故障API对应的第一处理模块b为故障处理模块。For example, the first service system B includes a first processing module a and a first processing module b; the first API identifier of the first processing module a is API11, and the first API identifier of the first processing module b is API12; the first service system When B performs the service request 72, a total of two internal service steps are required, which are the internal service step 1 and the internal service step 2; the internal processing step 1 is performed first through the first processing module a; and the internal processing is performed through the first processing module b. The service step 2: when the log system determines that the first service system B is the faulty service system according to the result code in the first processing log, the API corresponding to the API 12 is determined to be the fault API according to the API 12 carried in the first processing log. Optionally, the first processing module b corresponding to the fault API is a fault processing module.
该步骤可以由图5所示的日志系统140中的处理器511执行定位模块523来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the positioning module 523.
综上所述,本实施例提供的故障定位方法,通过第一业务系统和第二业务系统在执行与业务请求ID对应的业务步骤时,第一业务系统向日志系统发送对应的处理日志,日志系统根据接收到的处理日志中的执行结果确定异常业务步骤,最终定位出故障业务系统;由于第一业务系统为每个业务步骤都生成一个处理日志,使得日志系统根据处理日志中的执行结果可以确定出具体的故障业务系统,解决了现有技术中需要通过从上之下依次排查各个业务系统,最终确定出故障业务系统,当业务系统的个数较多时,导致对故障业务系统的定位效率较低的问题,达到了通过与业务请求ID对应的处理日志定位故障业务系统,提高了对故障业务系统的定位效率的效果。In summary, the fault location method provided in this embodiment, when the first service system and the second service system perform the service step corresponding to the service request ID, the first service system sends a corresponding processing log to the log system, and the log is generated. The system determines the abnormal service step according to the execution result in the received processing log, and finally locates the faulty service system. The first service system generates a processing log for each service step, so that the log system can perform the execution result according to the processing log. The specific faulty service system is determined, and the prior art needs to check each service system in order from the top and bottom, and finally determine the faulty service system. When the number of service systems is large, the positioning efficiency of the faulty service system is caused. The lower problem is that the faulty service system is located through the processing log corresponding to the service request ID, and the effect on the positioning efficiency of the faulty service system is improved.
另外,日志系统根据业务流程模型中的执行顺序依次根据第一处理日志中的执行结果确定内部业务步骤是否为异常业务步骤,有利于按照执行业务步骤的先后顺序依次确定异常业务步骤,有利于避免资源浪费,提高对故障业务系统的定位效率的效果。In addition, the log system determines whether the internal service step is an abnormal service step according to the execution result in the first processing log according to the execution order in the business process model, which is beneficial to sequentially determining abnormal business steps according to the sequence of executing the business steps, which is beneficial to avoid Waste of resources and improve the efficiency of positioning the faulty business system.
同时,在故障业务系统为第一业务系统时,日志系统根据第一处理日志中携带的第一API标识,确定第一API标识对应的API为故障API,通过在第一处理日志中携带有第一API标识,以便日志系统可以根据API标识定位出故障API,提高了对故障业务系统的定位的准确性的效果。At the same time, when the faulty service system is the first service system, the log system determines, according to the first API identifier carried in the first processing log, that the API corresponding to the first API identifier is a fault API, and carries the first in the first processing log. An API identifier, so that the log system can locate the fault API according to the API identifier, thereby improving the accuracy of positioning the faulty service system.
基于图6所示的实施例中,可选的,第一业务系统调用第二业务系统执行与业务请 求ID对应的外部业务步骤;处理日志为第二处理日志,第二处理日志用于记录第一业务系统在调用第二业务系统执行与业务请求ID对应的外部业务步骤的执行结果。则作为另一种可能的实现方式,步骤602至步骤603可以替换实现为如下步骤801至步骤805,如图8所示:Based on the embodiment shown in FIG. 6, optionally, the first service system invokes the second service system to perform and service. The external service step corresponding to the ID is obtained; the processing log is the second processing log, and the second processing log is used to record the execution result of the external service step corresponding to the service request ID by the first service system in the second service system. As another possible implementation manner, the steps 602 to 603 can be replaced by the following steps 801 to 805, as shown in FIG. 8 :
步骤801,日志系统获取与业务请求ID对应的业务流程模型,业务流程模型包括:与业务请求ID对应的各个业务步骤的执行顺序。Step 801: The log system acquires a business process model corresponding to the service request ID, where the business process model includes: an execution sequence of each service step corresponding to the service request ID.
在业务请求执行失败时,日志系统获取与业务请求ID对应的业务流程模型。When the service request fails to execute, the log system acquires a business process model corresponding to the service request ID.
第一业务系统调用第二业务系统在执行与业务请求ID对应的外部业务步骤时,需要按照预定的执行顺序执行整个业务请求ID对应的外部业务步骤,比如:如图9所示,执行业务请求81时,总共需要6个业务系统共同完成分别为业务系统91、业务系统92、业务系统93、业务系统94、业务系统95和业务系统96;共需要执行7个业务步骤,分别为业务步骤1、业务步骤2、业务步骤3、业务步骤4、业务步骤5、业务步骤6和业务步骤7;业务系统91先通过x模块执行业务步骤1、业务步骤2和业务步骤3;再通过y模块执行业务步骤4、业务步骤5和业务步骤6;最后通过z模块执行业务步骤7。其中,业务系统91通过x模块执行业务步骤1时需要通过2-1API调用业务系统92来完成;业务步骤92通过w模块执行业务步骤1时需要通过2-2API调用业务系统93来完成;业务系统91在执行业务步骤2时需要通过3-1API调用业务系统93来完成;业务系统91通过y模块执行业务步骤4时系统通过4-1API调用业务系统94来完成;业务系统91通过y模块执行业务步骤5时系统通过5-1API调用业务系统95来完成;业务系统91通过y模块执行业务步骤6时系统通过6-1API调用业务系统96来完成。则示例性的,执行业务请求81对应的业务流程模型如下表一所示:When the first service system calls the second service system to execute the external service step corresponding to the service request ID, the external service step corresponding to the entire service request ID needs to be executed according to the predetermined execution order, for example, as shown in FIG. At 81 o'clock, a total of six service systems are required to be completed as a service system 91, a service system 92, a service system 93, a service system 94, a service system 95, and a service system 96; a total of seven service steps are required, which are respectively service steps 1 Service Step 2, Service Step 3, Service Step 4, Service Step 5, Service Step 6 and Service Step 7; the service system 91 first executes the service step 1, the service step 2, and the service step 3 through the x module; Business step 4, business step 5, and business step 6; finally, business step 7 is performed through the z module. The service system 91 needs to invoke the service system 92 through the 2-1 API when the service step 1 is executed by the x module; the service step 92 needs to be invoked by the 2-2 API to invoke the service system 93 when the service step 1 is executed by the w module; the service system 91 is required to complete the service step 2 by calling the service system 93 through the 3-1 API; when the service system 91 executes the service step 4 through the y module, the system completes the service system 94 through the 4-1 API; the service system 91 executes the service through the y module. In step 5, the system is completed by calling the service system 95 through the 5-1 API; when the service system 91 executes the service step 6 through the y module, the system is completed by calling the service system 96 through the 6-1 API. Exemplarily, the business process model corresponding to the execution service request 81 is as shown in Table 1 below:
Figure PCTCN2017081072-appb-000001
Figure PCTCN2017081072-appb-000001
表二Table II
该步骤可以由图5所示的日志系统140中的处理器511执行获取模块524来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the acquisition module 524.
步骤802,日志系统根据执行顺序依次获取与各个业务步骤对应的m个第二处理日志,m为正整数。Step 802: The log system sequentially acquires m second processing logs corresponding to the respective service steps according to the execution order, where m is a positive integer.
日志系统在获取到业务流程模型后,根据业务流程模型中的执行顺序从接收到的处 理日志中,获取与各个业务步骤对应的m个第二处理日志。After the log system obtains the business process model, it receives from the received order according to the execution order in the business process model. In the management log, obtain m second processing logs corresponding to each service step.
可选的,第二处理日志是在被调用的第二业务系统执行与业务请求ID对应的外部业务步骤时对应的执行结果。Optionally, the second processing log is an execution result corresponding to when the called second service system executes an external service step corresponding to the service request ID.
该步骤可以由图5所示的日志系统140中的处理器511执行获取模块524来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the acquisition module 524.
步骤803,日志系统根据第二处理日志中的执行结果确定外部业务步骤是否为异常业务步骤。Step 803: The log system determines, according to the execution result in the second processing log, whether the external service step is an abnormal service step.
可选的,第二处理日志中的执行结果包括:执行成功或执行失败;异常业务步骤为执行结果为执行失败对应的业务步骤。当第二处理日志中的执行结果为执行失败时,日志系统确定该第二处理日志对应的外部业务步骤为异常业务步骤。Optionally, the execution result in the second processing log includes: the execution succeeds or the execution fails; the abnormal business step is the business step corresponding to the execution result being the execution failure. When the execution result in the second processing log is an execution failure, the log system determines that the external service step corresponding to the second processing log is an abnormal service step.
比如:第二处理日志1、第二处理日志2和第二处理日志3中第二处理日志2中的执行结果为执行失败,则日志系统根据第二处理日志2中的执行结果确定外部业务步骤2为异常业务步骤。For example, the execution result in the second processing log 2, the second processing log 2, and the second processing log 3 in the second processing log 2 is an execution failure, and the log system determines the external service step according to the execution result in the second processing log 2. 2 is an abnormal business step.
可选的,本步骤可以通过如下可能的实现方式实现:Optionally, this step can be implemented by the following possible implementation manners:
第一步,根据第j个第二处理日志中的执行结果确定外部业务步骤是否为异常业务步骤,j为小于等于m的正整数。In the first step, it is determined whether the external service step is an abnormal service step according to the execution result in the jth second processing log, and j is a positive integer equal to or smaller than m.
可选的,j的初始值为1。日志系统从第1个第二处理日志开始,根据第1个第二处理日志的执行结果确定对应的外部业务步骤是否为异常业务步骤。Optionally, the initial value of j is 1. The log system starts from the first second processing log, and determines whether the corresponding external service step is an abnormal service step according to the execution result of the first second processing log.
第二步,若不是异常业务步骤,则令j=j+1,继续根据第j个第二处理日志中的执行结果确定对应的外部业务步骤是否为异常业务步骤。In the second step, if it is not an abnormal service step, let j=j+1, and continue to determine whether the corresponding external service step is an abnormal service step according to the execution result in the jth second processing log.
通过上述两个步骤的循环,直至确定到异常业务步骤为止,否则对m个第二处理日志中的执行结果依次进行确定。Through the loop of the above two steps, until the abnormal business step is determined, otherwise the execution results in the m second processing logs are sequentially determined.
该步骤可以由图5所示的日志系统140中的处理器511执行确定模块522来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the determination module 522.
步骤804,在外部业务步骤是异常业务步骤时,日志系统将被调用的第二业务系统定位为故障业务系统。Step 804: When the external service step is an abnormal service step, the log system locates the called second service system as a faulty service system.
当日志系统确定第j个第二处理日志中对应的外部业务步骤是异常业务步骤时,则日志系统将执行该外部业务步骤的第二业务系统定位为故障业务系统。比如:日志系统确定第2个第二处理日志中对应的外部业务步骤2为异常业务步骤,则日志系统将被调用执行外部业务步骤2的第二业务系统A1确定为故障业务系统。When the log system determines that the corresponding external service step in the jth second processing log is an abnormal service step, the log system locates the second service system that performs the external service step as the faulty service system. For example, the log system determines that the corresponding external service step 2 in the second second processing log is an abnormal service step, and the log system is determined to be the faulty service system by the second service system A1 that is called to execute the external service step 2.
可选的,第一业务系统包括:具有第一API的第一处理模块,第一API具有对应的第一API标识;第二业务系统包括:具有第二API的第二处理模块,第二API具有对应的第二API标识;第二处理日志包括:业务请求ID、第一业务系统ID、第一API标识、第二业务系统ID、第二API标识和返回码,结果码是指在调用第二处理模块执行外部业务步骤的执行结果。Optionally, the first service system includes: a first processing module having a first API, the first API has a corresponding first API identifier, and the second service system includes: a second processing module having a second API, and a second API Having a corresponding second API identifier; the second processing log includes: a service request ID, a first service system ID, a first API identifier, a second service system ID, a second API identifier, and a return code, and the result code is in the call The second processing module executes the execution result of the external business step.
该步骤可以由图5所示的日志系统140中的处理器511执行定位模块523来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the positioning module 523.
步骤805,在故障业务系统为第二业务系统时,日志系统根据第二处理日志中包含的第二API标识,将第二API标识对应的API定位为故障API。Step 805: When the faulty service system is the second service system, the log system locates the API corresponding to the second API identifier as the fault API according to the second API identifier included in the second processing log.
日志系统在根据第二处理日志中携带的返回码确定出被调用的第二业务系统为故障业务系统后,将第二处理日志中包含的第二API标识对应的API确定为故障API。The log system determines, according to the return code carried in the second processing log, that the called second service system is the faulty service system, and determines the API corresponding to the second API identifier included in the second processing log as the fault API.
比如:第二业务系统B1包括第二处理模块a1和第二处理模块b1;第二处理模块a1的第二API标识为API21,第二处理模块b的第二API标识为API22;调用第二业务系统 B1执行业务请求82时,总共需要执行2个外部业务步骤,分别为外部业务步骤1和外部业务步骤2;先通过第二处理模块a1执行外部业务步骤1;再通过第二处理模块b1执行外部业务步骤2;当日志系统根据第二处理日志中的返回码确定第二业务系统B1为故障业务系统时,根据第二处理日志中携带的API22确定API22对应的API为故障API。可选的,故障API对应的第二处理模块b1为故障处理模块。For example, the second service system B1 includes the second processing module a1 and the second processing module b1; the second API identifier of the second processing module a1 is the API 21, and the second API identifier of the second processing module b is the API 22; System When B1 executes the service request 82, a total of two external service steps are required, which are external service step 1 and external service step 2; first, external service step 1 is performed through the second processing module a1; and external processing is performed through the second processing module b1. The service step 2: when the log system determines that the second service system B1 is the faulty service system according to the return code in the second process log, the API corresponding to the API 22 is determined to be the fault API according to the API 22 carried in the second process log. Optionally, the second processing module b1 corresponding to the fault API is a fault processing module.
该步骤可以由图5所示的日志系统140中的处理器511执行定位模块523来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the positioning module 523.
综上所述,本实施例提供的故障定位方法,通过第一业务系统和第二业务系统在执行与业务请求ID对应的业务步骤时,第一业务系统向日志系统发送对应的处理日志,日志系统根据接收到的处理日志中的执行结果确定异常业务步骤,最终定位出故障业务系统;由于第一业务系统为每个业务步骤都生成一个处理日志,使得日志系统根据处理日志中的执行结果可以确定出具体的故障业务系统,解决了现有技术中需要通过从上之下依次排查各个业务系统,最终确定出故障业务系统,当业务系统的个数较多时,导致对故障业务系统的定位效率较低的问题,达到了通过与业务请求ID对应的处理日志定位故障业务系统,提高了对故障业务系统的定位效率的效果。In summary, the fault location method provided in this embodiment, when the first service system and the second service system perform the service step corresponding to the service request ID, the first service system sends a corresponding processing log to the log system, and the log is generated. The system determines the abnormal service step according to the execution result in the received processing log, and finally locates the faulty service system. The first service system generates a processing log for each service step, so that the log system can perform the execution result according to the processing log. The specific faulty service system is determined, and the prior art needs to check each service system in order from the top and bottom, and finally determine the faulty service system. When the number of service systems is large, the positioning efficiency of the faulty service system is caused. The lower problem is that the faulty service system is located through the processing log corresponding to the service request ID, and the effect on the positioning efficiency of the faulty service system is improved.
另外,日志系统根据业务流程模型中的执行顺序依次根据第二处理日志中的执行结果确定外部业务步骤是否为异常业务步骤,有利于按照执行业务步骤的先后顺序依次确定异常业务步骤,有利于避免资源浪费,提高对故障业务系统的定位效率的效果。In addition, the log system determines whether the external service step is an abnormal service step according to the execution result in the second processing log according to the execution order in the business process model, which is beneficial to sequentially determining abnormal business steps according to the sequence of executing the business steps, thereby facilitating avoidance. Waste of resources and improve the efficiency of positioning the faulty business system.
同时,在故障业务系统为第二业务系统时,日志系统根据第二处理日志中携带的第二API标识,确定第二API标识对应的API为故障API,通过在第二处理日志中携带有第二API标识,以便日志系统可以根据API标识定位出故障API,提高了对故障业务系统的定位的准确性的效果。At the same time, when the faulty service system is the second service system, the log system determines, according to the second API identifier carried in the second processing log, that the API corresponding to the second API identifier is a fault API, and carries the first in the second processing log. The second API identifier is provided, so that the log system can locate the fault API according to the API identifier, thereby improving the accuracy of positioning the faulty service system.
请参考图10,其示出了本发明另一个实施例提供的故障定位方法的方法流程图。本实施例以该故障定位方法应用于图3所示的故障定位系统中来举例说明。可选的,日志系统包括:分析组件、建模组件、ID处理组件和日志组件;该故障定位方法包括以下步骤:Please refer to FIG. 10, which is a flowchart of a method for locating a fault according to another embodiment of the present invention. This embodiment is exemplified by applying the fault location method to the fault location system shown in FIG. 3. Optionally, the log system includes: an analysis component, a modeling component, an ID processing component, and a log component; the fault location method includes the following steps:
步骤1001,分析组件接收待分析的业务请求ID。In step 1001, the analysis component receives the service request ID to be analyzed.
在业务请求执行失败时,分析组件接收输入的待分析的业务请求ID。When the service request fails to execute, the analysis component receives the input service request ID to be analyzed.
该步骤可以由图5所示的日志系统140中的处理器511执行接收模块521来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the receiving module 521.
步骤1002,分析组件通过ID处理组件获取与业务请求ID对应的业务流程模型ID。In step 1002, the analysis component obtains the business process model ID corresponding to the service request ID through the ID processing component.
分析组件向ID处理组件发送携带有业务请求ID的流程ID请求。该流程ID请求用于向ID处理组件请求与业务请求ID对应的业务流程模型ID。The analysis component sends a process ID request carrying the service request ID to the ID processing component. The process ID request is used to request a business process model ID corresponding to the service request ID from the ID processing component.
ID处理组件接收到流程ID请求后,根据流程ID请求中携带的业务请求ID,查询与业务请求ID对应的业务流程模型ID;将查询的到业务流程模型ID反馈给分析组件。After receiving the process ID request, the ID processing component queries the business process model ID corresponding to the service request ID according to the service request ID carried in the process ID request, and feeds the queryed business process model ID to the analysis component.
该步骤可以由图5所示的日志系统140中的处理器511执行获取模块524来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the acquisition module 524.
步骤1003,分析组件通过建模组件获取与业务请求ID对应的业务流程模型。Step 1003: The analysis component acquires a business process model corresponding to the service request ID by using the modeling component.
分析组件向建模组件发送携带有业务流程模型ID的模型请求。模型请求用于向建模组件请求反馈与业务流程模型ID对应的业务流程模型。The analysis component sends a model request carrying the business process model ID to the modeling component. The model request is used to request feedback from the modeling component to the business process model corresponding to the business process model ID.
建模组件接收到模型请求后,根据模型请求中携带的业务流程模型ID查询与业务流程模型ID对应的业务流程模型,并将查询到的业务流程模型反馈给分析组件。 After receiving the model request, the modeling component queries the business process model corresponding to the business process model ID according to the business process model ID carried in the model request, and feeds back the queryed business process model to the analysis component.
业务流程模型的详细描述请参考图7所示的步骤701和图所示的步骤801,此处不再赘述。For a detailed description of the business process model, please refer to step 701 shown in FIG. 7 and step 801 shown in the figure, and details are not described herein again.
该步骤可以由图5所示的日志系统140中的处理器511执行获取模块524来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the acquisition module 524.
步骤1004,分析组件通过日志组件获取与业务请求ID对应的处理日志。In step 1004, the analysis component obtains a processing log corresponding to the service request ID through the log component.
分析组件在获取到业务流程模型后,向日志组件发送日志获取请求,该日志获取请求用于请求日志组件反馈与业务请求ID对应的处理日志;可选的,处理日志包括:第一处理日志和第二处理日志。After obtaining the business process model, the analysis component sends a log acquisition request to the log component, where the log acquisition request is used to request the log component to feed back the processing log corresponding to the service request ID; optionally, the processing log includes: the first processing log and The second processing log.
关于第一处理日志的详细描述请参考图7所示的步骤702,关于第二处理日志的详细描述请参考图8所示的步骤802,此处不再赘述。For a detailed description of the first processing log, refer to step 702 shown in FIG. 7. For a detailed description of the second processing log, refer to step 802 shown in FIG. 8, and details are not described herein again.
日志组件接收到日志获取请求后,获取日志获取请求中携带的业务请求ID,根据业务请求ID查询与业务请求ID对应的第一处理日志和第二处理日志;并将查询到的与业务请求ID对应的n个第一处理日志和m个第二处理日志反馈给分析组件。After receiving the log obtaining request, the log component obtains the service request ID carried in the log obtaining request, and queries the first processing log and the second processing log corresponding to the service request ID according to the service request ID; and the queried service request ID The corresponding n first processing logs and m second processing logs are fed back to the analysis component.
该步骤可以由图5所示的日志系统140中的处理器511执行获取模块524来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the acquisition module 524.
步骤1005,分析组件根据第i个第一处理日志中的执行结果确定内部业务步骤是否为异常业务步骤。Step 1005: The analysis component determines, according to the execution result in the i-th first processing log, whether the internal service step is an abnormal service step.
可选的,分析组件按照业务流程模型中的执行顺序依次根据第i个第一处理日志中的执行结果确定内部业务步骤是否为异常业务步骤,其中,i为小于等于n的正整数。Optionally, the analysis component determines, according to the execution result in the i-th first processing log, whether the internal service step is an abnormal service step according to an execution order in the business process model, where i is a positive integer less than or equal to n.
该步骤可以由图5所示的日志系统140中的处理器511执行确定模块522来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the determination module 522.
步骤1006,若不是异常业务步骤,则分析组件令i=i+1,继续根据第i个第一处理日志中的执行结果确定对应的内部业务步骤是否为异常业务步骤。In step 1006, if it is not an abnormal service step, the analysis component makes i=i+1, and continues to determine whether the corresponding internal service step is an abnormal service step according to the execution result in the i-th first processing log.
通过步骤1005和步骤1006的循环,直至确定到异常业务步骤为止,否则对n个第一处理日志中的执行结果依次进行确定。Through the loop of step 1005 and step 1006, until the abnormal service step is determined, otherwise the execution results in the n first processing logs are sequentially determined.
该步骤可以由图5所示的日志系统140中的处理器511执行确定模块522来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the determination module 522.
步骤1007,若是异常业务步骤,则分析组件从m个第二处理日志中,获取与执行异常业务步骤的第一处理模块对应的t个第二处理日志。In step 1007, if it is an abnormal service step, the analysis component obtains t second processing logs corresponding to the first processing module that executes the abnormal service step from the m second processing logs.
若内部业务步骤是异常业务步骤,则分析组件将执行该内部业务步骤的第一处理模块确定为故障模块,由于第一处理模块存在调用其他业务系统完成业务步骤的可能,因此,分析组件获取与第一处理模块对应的t个第二处理日志。If the internal service step is an abnormal service step, the analysis component determines that the first processing module that executes the internal service step is a faulty module. Because the first processing module has the possibility of invoking other service systems to complete the business step, the analysis component acquires The t second processing logs corresponding to the first processing module.
该步骤可以由图5所示的日志系统140中的处理器511执行获取模块524来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the acquisition module 524.
步骤1008,分析组件根据第j个第二处理日志中的执行结果确定外部业务步骤是否为异常业务步骤。Step 1008: The analysis component determines, according to the execution result in the jth second processing log, whether the external service step is an abnormal service step.
可选的,分析组件按照业务流程模型中的执行顺序依次根据第j个第二处理日志中的执行结果确定外部业务步骤是否为异常业务步骤,其中,j为小于等于t的正整数。Optionally, the analyzing component determines, according to the execution result in the jth second processing log, whether the external service step is an abnormal service step according to the execution order in the business process model, where j is a positive integer less than or equal to t.
该步骤可以由图5所示的日志系统140中的处理器511执行确定模块522来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the determination module 522.
步骤1009,若不是异常业务步骤,则分析组件令j=j+1,继续根据第j个第二处理日志中的执行结果确定外部业务步骤是否为异常业务步骤。In step 1009, if it is not an abnormal service step, the analysis component makes j=j+1, and continues to determine whether the external service step is an abnormal service step according to the execution result in the jth second processing log.
通过步骤1008和步骤1009的循环,直至确定到异常业务步骤为止,否则对t个第二处理日志中的执行结果依次进行确定。Through the loop of step 1008 and step 1009, until the abnormal business step is determined, otherwise the execution results in the t second processing logs are sequentially determined.
该步骤可以由图5所示的日志系统140中的处理器511执行确定模块522来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the determination module 522.
步骤1010,若是异常业务步骤,则分析组件将被调用的第二业务系统中第二API标 识对应的API定位为故障API。Step 1010: If it is an abnormal service step, the second API label in the second service system to which the analysis component is to be called is The corresponding API is located as a fault API.
若外部业务步骤是异常业务步骤,则分析组件将执行该外部业务步骤的第二处理模块对应的第二API标识对应的API定位为故障API。If the external service step is an abnormal service step, the analysis component locates the API corresponding to the second API identifier corresponding to the second processing module of the external service step as the fault API.
本步骤的详细描述请参考图8所示的步骤805,此处不再赘述。For details of this step, refer to step 805 shown in Figure 8, and details are not described herein.
该步骤可以由图5所示的日志系统140中的处理器511执行定位模块524来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the positioning module 524.
步骤1011,若不存在与第一处理模块对应的第二处理日志,则分析组件将第一处理模块的第一API标识对应的API定位为故障API。Step 1011: If there is no second processing log corresponding to the first processing module, the analysis component locates the API corresponding to the first API identifier of the first processing module as a fault API.
分析组件在第i个第一处理日志中的执行结果确定内部业务步骤是异常业务步骤时,分析组件确定执行该内部业务步骤的第一处理模块为故障模块,若m个第二处理日志中不存在与第一处理模块对应的第二处理日志时,则分析组件将第一处理模块中第一API标识对应的API定位为故障API;或者,在分析组件确定出与第一处理模块对应的t个第二处理日志中的执行结果中外部业务步骤都不是异常业务步骤时,分析组件将第一处理模块中第一API标识对应的API定位为故障API。When the execution result of the analysis component in the i-th first processing log determines that the internal service step is an abnormal service step, the analysis component determines that the first processing module that executes the internal service step is a fault module, and if the m second processing logs are not When there is a second processing log corresponding to the first processing module, the analysis component locates the API corresponding to the first API identifier in the first processing module as a fault API; or, the analysis component determines the t corresponding to the first processing module. When the external service step in the execution result in the second processing log is not an abnormal service step, the analysis component locates the API corresponding to the first API identifier in the first processing module as the fault API.
该步骤可以由图5所示的日志系统140中的处理器511执行定位模块524来实现。This step can be implemented by the processor 511 in the log system 140 shown in FIG. 5 executing the positioning module 524.
综上所述,本实施例提供的故障定位方法,通过第一业务系统和第二业务系统在执行与业务请求ID对应的业务步骤时,第一业务系统向日志系统发送对应的处理日志,日志系统根据接收到的处理日志中的执行结果确定异常业务步骤,最终定位出故障业务系统;由于第一业务系统为每个业务步骤都生成一个处理日志,使得日志系统根据处理日志中的执行结果可以确定出具体的故障业务系统,解决了现有技术中需要通过从上之下依次排查各个业务系统,最终确定出故障业务系统,当业务系统的个数较多时,导致对故障业务系统的定位效率较低的问题,达到了通过与业务请求ID对应的处理日志定位故障业务系统,提高了对故障业务系统的定位效率的效果。In summary, the fault location method provided in this embodiment, when the first service system and the second service system perform the service step corresponding to the service request ID, the first service system sends a corresponding processing log to the log system, and the log is generated. The system determines the abnormal service step according to the execution result in the received processing log, and finally locates the faulty service system. The first service system generates a processing log for each service step, so that the log system can perform the execution result according to the processing log. The specific faulty service system is determined, and the prior art needs to check each service system in order from the top and bottom, and finally determine the faulty service system. When the number of service systems is large, the positioning efficiency of the faulty service system is caused. The lower problem is that the faulty service system is located through the processing log corresponding to the service request ID, and the effect on the positioning efficiency of the faulty service system is improved.
需要补充说明的是,本实施例中仅以先根据第一处理日志中的执行结果确定内部业务步骤是否为异常业务步骤后,再根据第二处理日志中的执行结果确定外部业务步骤是否为异常业务步骤为例进行说明,对第一处理日志和第二处理日志的先后顺序并不做具体限定。可选的,可以先根据第二处理日志中的执行结果确定外部业务步骤是否为异常业务步骤后,再根据第一处理日志中的执行结果确定内部业务步骤是否为异常业务步骤。It should be noted that, in this embodiment, whether the internal service step is abnormal according to the execution result in the first processing log, and then the external service step is determined to be abnormal according to the execution result in the second processing log. The service steps are described as an example. The order of the first processing log and the second processing log is not specifically limited. Optionally, after determining whether the external service step is an abnormal service step according to the execution result in the second processing log, determining whether the internal service step is an abnormal service step according to the execution result in the first processing log.
下述为本发明装置实施例,可以用于执行本发明方法实施例。对于本发明装置实施例中未披露的细节,请参照本发明方法实施例。The following is an embodiment of the apparatus of the present invention, which can be used to carry out the method embodiments of the present invention. For details not disclosed in the embodiment of the device of the present invention, please refer to the method embodiment of the present invention.
请参考图11,其示出了本发明一个实施例提供的故障定位装置的结构框图,该故障定位装置可以通过软件、硬件或者两者的结合实现为图2或图3所示的日志系统140中的全部或者部分。该故障定位装置可以包括:Please refer to FIG. 11 , which is a structural block diagram of a fault locating device according to an embodiment of the present invention. The fault locating device can be implemented as a log system 140 shown in FIG. 2 or FIG. 3 by software, hardware, or a combination of both. All or part of it. The fault location device can include:
接收单元1120,具有与接收模块521相同或相似的功能,以及由接收模块521包含的其它隐含功能。The receiving unit 1120 has the same or similar functions as the receiving module 521, and other implicit functions included by the receiving module 521.
确定单元1140,具有与确定模块522相同或相似的功能,以及由确定模块522包含的其它隐含功能。The determining unit 1140 has the same or similar functionality as the determining module 522, as well as other implicit functions included by the determining module 522.
定位单元1160,具有与定位模块523相同或相似的功能,以及由定位模块523包含的其它隐含功能。The positioning unit 1160 has the same or similar functionality as the positioning module 523, as well as other implicit functions contained by the positioning module 523.
获取单元1180,具有与获取模块524相同或相似的功能,以及由获取模块524包含 的其它隐含功能。The obtaining unit 1180 has the same or similar function as the obtaining module 524, and is included by the obtaining module 524 Other hidden features.
应当理解的是,在本文中使用的,除非上下文清楚地支持例外情况,单数形式“一个”(“a”、“an”、“the”)旨在也包括复数形式。还应当理解的是,在本文中使用的“和/或”是指包括一个或者一个以上相关联地列出的项目的任意和所有可能组合。It is to be understood that the singular forms "a", "the", "the" It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the embodiments of the present invention are merely for the description, and do not represent the advantages and disadvantages of the embodiments.
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。A person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium. The storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 The above are only the preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalents, improvements, etc., which are within the spirit and scope of the present invention, should be included in the protection of the present invention. Within the scope.

Claims (20)

  1. 一种故障定位平台,其特征在于,所述平台包括:标识分配系统、日志系统、第一业务系统和第二业务系统;A fault location platform, the platform includes: an identifier distribution system, a log system, a first service system, and a second service system;
    所述标识分配系统,用于向业务请求分配业务请求标识ID,所述业务请求是所述第一业务系统执行业务时发送的;所述业务是由存在调用关系的所述第一业务系统和所述第二业务系统协作执行的业务;The identifier distribution system is configured to allocate a service request identifier ID to the service request, where the service request is sent when the first service system performs a service; the service is the first service system in which the call relationship exists The service that the second service system cooperates to perform;
    所述第一业务系统,用于生成与所述业务请求ID对应的各个业务步骤的处理日志,所述处理日志用于记录所述业务步骤的执行结果;所述各个业务步骤包括:所述第一业务系统执行的业务步骤,和,所述第一业务系统调用所述第二业务系统执行的业务步骤;The first service system is configured to generate a processing log of each service step corresponding to the service request ID, where the processing log is used to record an execution result of the service step; the each service step includes: a business step performed by a service system, and the first service system invokes a service step performed by the second service system;
    所述日志系统,用于接收与所述业务请求ID对应的所述处理日志;根据所述处理日志中的所述执行结果确定异常业务步骤,将用于执行所述异常业务步骤的业务系统定位为故障业务系统。The log system is configured to receive the processing log corresponding to the service request ID, determine an abnormal service step according to the execution result in the processing log, and locate a service system for performing the abnormal service step For faulty business systems.
  2. 根据权利要求1所述的平台,其特征在于,The platform of claim 1 wherein:
    所述第一业务系统,用于在执行与所述业务请求ID对应的内部业务步骤时,生成与所述内部业务步骤对应的第一处理日志,向所述日志系统发送所述第一处理日志,所述第一处理日志用于记录所述第一业务系统执行所述内部业务步骤的执行结果;The first service system is configured to: when executing an internal service step corresponding to the service request ID, generate a first processing log corresponding to the internal service step, and send the first processing log to the log system The first processing log is used to record an execution result of the first service system performing the internal service step;
    所述第一业务系统,还用于在调用所述第二业务系统执行与所述业务请求ID对应的外部业务步骤时,生成与所述外部业务步骤对应的第二处理日志,向所述日志系统发送所述第二处理日志,所述第二处理日志用于记录被调用的所述第二业务系统执行所述外部业务步骤的执行结果;The first service system is further configured to: when the second service system is invoked to perform an external service step corresponding to the service request ID, generate a second processing log corresponding to the external service step, to the log The system sends the second processing log, where the second processing log is used to record an execution result of the second service system that is invoked to execute the external service step;
    所述日志系统,用于根据所述第一处理日志中的所述执行结果确定所述内部业务步骤是否为所述异常业务步骤,在所述内部业务步骤是所述异常业务步骤时,将所述第一业务系统定位为所述故障业务系统;根据所述第二处理日志中的所述执行结果确定所述外部业务步骤是否为所述异常业务步骤,在所述外部业务步骤是所述异常业务步骤时,将被调用的所述第二业务系统定位为所述故障业务系统。The log system is configured to determine, according to the execution result in the first processing log, whether the internal service step is the abnormal service step, and when the internal service step is the abnormal service step, Determining, by the first service system, the faulty service system; determining, according to the execution result in the second processing log, whether the external service step is the abnormal service step, where the external service step is the abnormality In the business step, the second service system to be called is located as the faulty service system.
  3. 根据权利要求2所述的平台,其特征在于,所述第一业务系统包括:具有第一应用编程接口API的第一处理模块,所述第一API具有对应的第一API标识;所述第二业务系统包括:具有第二API的第二处理模块,所述第二API具有对应的第二API标识;The platform according to claim 2, wherein the first service system comprises: a first processing module having a first application programming interface API, the first API having a corresponding first API identifier; The second service system includes: a second processing module having a second API, the second API having a corresponding second API identifier;
    所述第一业务系统,用于向所述日志系统发送所述第一处理日志;所述第一处理日志包括:所述业务请求ID、第一业务系统ID、所述第一API标识和结果码,所述结果码是指所述第一处理模块执行所述内部业务步骤的执行结果;The first service system is configured to send the first processing log to the log system; the first processing log includes: the service request ID, a first service system ID, the first API identifier, and a result a code, the result code is an execution result of the first processing module performing the internal service step;
    所述第一业务系统,还用于向所述日志系统发送所述第二处理日志;所述第二处理日志包括:所述业务请求ID、所述第一业务系统ID、所述第一API标识、第二业务系统ID、所述第二API标识和返回码,所述返回码是指在调用所述第二处理模块执行所述外部业务步骤的执行结果;The first service system is further configured to send the second processing log to the log system; the second processing log includes: the service request ID, the first service system ID, and the first API An identifier, a second service system ID, the second API identifier, and a return code, where the return code refers to an execution result of invoking the second processing module to execute the external service step;
    所述日志系统,用于在所述故障业务系统为所述第一业务系统时,将所述第一API标识对应的API定位为故障API;在所述故障业务系统为被调用的所述第二业务系统时,将所述第二API标识对应的API定位为所述故障API。 The logging system is configured to: when the faulty service system is the first service system, locate an API corresponding to the first API identifier as a fault API; where the faulty service system is the called In the second service system, the API corresponding to the second API identifier is located as the fault API.
  4. 根据权利要求3所述的平台,其特征在于,A platform according to claim 3, wherein
    所述日志系统,用于获取与所述业务请求ID对应的业务流程模型,所述业务流程模型包括:与所述业务请求ID对应的各个业务步骤的执行顺序;根据所述执行顺序依次获取与各个业务步骤对应的n个第一处理日志和m个第二处理日志,所述n和所述m分别为正整数。The log system is configured to acquire a business process model corresponding to the service request ID, where the business process model includes: an execution sequence of each service step corresponding to the service request ID; and sequentially acquiring and executing according to the execution sequence n first processing logs and m second processing logs corresponding to each service step, where n and the m are positive integers, respectively.
  5. 根据权利要求4所述的平台,其特征在于,所述日志系统,还用于:The platform according to claim 4, wherein the log system is further configured to:
    根据第i个第一处理日志中的执行结果确定所述内部业务步骤是否为所述异常业务步骤,所述i为小于等于n的正整数;Determining, according to an execution result in the i th first processing log, whether the internal service step is the abnormal service step, where i is a positive integer less than or equal to n;
    若是所述异常业务步骤,则将所述第i个第一处理日志中包括的第一API标识对应的API定位为所述故障API;If the abnormal service step is performed, the API corresponding to the first API identifier included in the i-th first processing log is located as the fault API;
    若不是所述异常业务步骤,则令i=i+1,再次根据所述第i个第一处理日志中的执行结果确定所述内部业务步骤是否为所述异常业务步骤。If it is not the abnormal service step, let i=i+1, and determine, according to the execution result in the i-th first processing log, whether the internal service step is the abnormal service step.
  6. 根据权利要求4所述的平台,其特征在于,所述日志系统,还用于:The platform according to claim 4, wherein the log system is further configured to:
    根据第j个第二处理日志中的执行结果确定所述外部业务步骤是否为所述异常业务步骤,所述j为小于等于m的正整数;Determining, according to an execution result in the jth second processing log, whether the external service step is the abnormal service step, where j is a positive integer equal to or smaller than m;
    若是所述异常业务步骤,则将所述第j个第二处理日志中包括的第二API标识对应的API定位为所述故障API;If the abnormal service step is performed, the API corresponding to the second API identifier included in the jth second processing log is located as the fault API;
    若不是所述异常业务步骤,则令j=j+1,再次根据所述第j个第二处理日志中的执行结果确定所述外部业务步骤是否为所述异常业务步骤。If it is not the abnormal service step, let j=j+1, and determine, according to the execution result in the jth second processing log, whether the external service step is the abnormal service step.
  7. 一种故障定位方法,其特征在于,所述方法包括:A fault location method, characterized in that the method comprises:
    接收与业务请求标识ID对应的处理日志;业务请求是第一业务系统执行业务时发送的,所述业务是由存在调用关系的所述第一业务系统和第二业务系统协作执行的业务,所述处理日志用于记录与所述业务请求ID对应的各个业务步骤的执行结果,所述各个业务步骤包括:所述第一业务系统执行的业务步骤,和,所述第一业务系统调用所述第二业务系统执行的业务步骤;Receiving a processing log corresponding to the service request identifier ID; the service request is sent by the first service system when the first service system performs the service, and the service is performed by the first service system and the second service system in which the call relationship exists. The processing log is used to record the execution result of each service step corresponding to the service request ID, where the respective service steps include: a service step performed by the first service system, and the first service system calls the Business steps performed by the second business system;
    根据所述处理日志中的所述执行结果确定异常业务步骤;Determining an abnormal service step according to the execution result in the processing log;
    将用于执行所述异常业务步骤的业务系统定位为故障业务系统。The business system for performing the abnormal business step is positioned as a faulty business system.
  8. 根据权利要求7所述的方法,其特征在于,所述处理日志包括:第一处理日志和第二处理日志;The method according to claim 7, wherein the processing log comprises: a first processing log and a second processing log;
    所述根据所述处理日志中的所述执行结果确定异常业务步骤,包括:Determining an abnormal service step according to the execution result in the processing log, including:
    根据第一处理日志中的所述执行结果确定内部业务步骤是否为所述异常业务步骤;所述第一处理日志用于记录所述第一业务系统执行与所述业务请求ID对应的所述内部业务步骤的执行结果;Determining, according to the execution result in the first processing log, whether the internal service step is the abnormal service step; the first processing log is used to record that the first service system executes the internal corresponding to the service request ID The result of the execution of the business steps;
    根据第二处理日志中的所述执行结果确定外部业务步骤是否为所述异常业务步骤;所述第二处理日志用于记录在调用所述第二业务系统执行与所述业务请求ID对应的所 述外部业务步骤的执行结果。Determining, according to the execution result in the second processing log, whether the external service step is the abnormal service step; the second processing log is configured to record, in the calling the second service system, the location corresponding to the service request ID The result of the execution of the external business steps.
  9. 根据权利要求8所述的方法,其特征在于,所述将用于执行所述异常业务步骤的业务系统定位为故障业务系统,包括:The method according to claim 8, wherein the positioning the service system for performing the abnormal service step as a faulty service system comprises:
    在所述内部业务步骤是所述异常业务步骤时,将所述第一业务系统定位为所述故障业务系统;And when the internal service step is the abnormal service step, positioning the first service system as the faulty service system;
    在所述外部业务步骤是所述异常业务步骤时,将被调用的所述第二业务系统定位为所述故障业务系统。When the external service step is the abnormal service step, the called second service system is located as the faulty service system.
  10. 根据权利要求9所述的方法,其特征在于,所述第一业务系统包括:具有第一应用编程接口API的第一处理模块,所述第一API具有对应的第一API标识;所述第二业务系统包括:具有第二API的第二处理模块,所述第二API具有对应的第二API标识;所述方法,还包括:The method according to claim 9, wherein the first service system comprises: a first processing module having a first application programming interface API, the first API having a corresponding first API identifier; The second service system includes: a second processing module having a second API, where the second API has a corresponding second API identifier; the method further includes:
    在所述故障业务系统为所述第一业务系统时,根据所述第一处理日志中包含的所述第一API标识,将所述第一API标识对应的API定位为故障API;所述第一处理日志包括:所述业务请求ID、第一业务系统ID、所述第一API标识和结果码,所述结果码是指所述第一处理模块执行所述内部业务步骤的执行结果;When the faulty service system is the first service system, the API corresponding to the first API identifier is located as a fault API according to the first API identifier included in the first processing log; The processing log includes: the service request ID, the first service system ID, the first API identifier, and the result code, where the result code refers to an execution result of the first processing module executing the internal service step;
    在所述故障业务系统为被调用的所述第二业务系统时,根据所述第二处理日志中包含的所述第二API标识,将所述第二API标识对应的API定位为所述故障API;所述第二处理日志包括:所述业务请求ID、所述第一业务系统ID、所述第一API标识、第二业务系统ID、所述第二API标识和返回码,所述返回码是指在调用所述第二处理模块执行所述外部业务步骤的执行结果。When the faulty service system is the called second service system, the API corresponding to the second API identifier is located as the fault according to the second API identifier included in the second processing log. The second processing log includes: the service request ID, the first service system ID, the first API identifier, the second service system ID, the second API identifier, and a return code, and the return The code refers to the execution result of the execution of the external service step by calling the second processing module.
  11. 根据权利要求10所述的方法,其特征在于,所述方法,还包括:The method of claim 10, wherein the method further comprises:
    获取与所述业务请求ID对应的业务流程模型,所述业务流程模型包括:与所述业务请求ID对应的各个业务步骤的执行顺序;Obtaining a business process model corresponding to the service request ID, where the business process model includes: an execution sequence of each service step corresponding to the service request ID;
    根据所述执行顺序依次获取与各个业务步骤对应的n个第一处理日志和m个第二处理日志,所述n和所述m分别为正整数。Obtaining n first processing logs and m second processing logs corresponding to the respective service steps in sequence according to the execution sequence, where n and the m are positive integers, respectively.
  12. 根据权利要求11所述的方法,其特征在于,所述根据第一处理日志中的所述执行结果确定内部业务步骤是否为所述异常业务步骤,包括:The method according to claim 11, wherein the determining, according to the execution result in the first processing log, whether the internal service step is the abnormal service step comprises:
    根据第i个第一处理日志中的执行结果确定所述内部业务步骤是否为所述异常业务步骤,所述i为小于等于n的正整数;Determining, according to an execution result in the i th first processing log, whether the internal service step is the abnormal service step, where i is a positive integer less than or equal to n;
    所述将所述第一API标识对应的API定位为故障API,包括:The positioning the API corresponding to the first API identifier as a fault API includes:
    若是所述异常业务步骤,则将所述第i个第一处理日志中包括的第一API标识对应的API定位为所述故障API;If the abnormal service step is performed, the API corresponding to the first API identifier included in the i-th first processing log is located as the fault API;
    若不是所述异常业务步骤,则令i=i+1,再次执行所述根据所述第i个第一处理日志中的执行结果确定是否为所述异常业务步骤的步骤。If it is not the abnormal service step, let i=i+1, and perform the step of determining whether the abnormal service step is the step according to the execution result in the i-th first processing log.
  13. 根据权利要求11所述的方法,其特征在于,所述根据第二处理日志中的所述执 行结果确定外部业务步骤是否为所述异常业务步骤,包括:The method of claim 11 wherein said performing in said second processing log The result of the line determines whether the external business step is the abnormal business step, including:
    根据第j个第二处理日志中的执行结果确定所述外部业务步骤是否为所述异常业务步骤,所述j为小于等于m的正整数;Determining, according to an execution result in the jth second processing log, whether the external service step is the abnormal service step, where j is a positive integer equal to or smaller than m;
    所述将所述第二API标识对应的API定位为所述故障API,包括:The positioning of the API corresponding to the second API identifier as the fault API includes:
    若是所述异常业务步骤,则将所述第j个第二处理日志中包括的第二API标识对应的API定位为所述故障API;If the abnormal service step is performed, the API corresponding to the second API identifier included in the jth second processing log is located as the fault API;
    若不是所述异常业务步骤,则令j=j+1,再次执行所述根据所述第j个第二处理日志中的执行结果确定所述外部业务步骤是否为所述异常业务步骤的步骤。If it is not the abnormal service step, let j=j+1, and perform the step of determining whether the external service step is the abnormal service step according to the execution result in the jth second processing log.
  14. 一种故障定位装置,其特征在于,所述装置包括:A fault locating device, characterized in that the device comprises:
    接收单元,用于接收与业务请求标识ID对应的处理日志;业务请求是第一业务系统执行业务时发送的,所述业务是由存在调用关系的所述第一业务系统和第二业务系统协作执行的业务,所述处理日志用于记录与所述业务请求ID对应的各个业务步骤的执行结果,所述各个业务步骤包括:所述第一业务系统执行的业务步骤,和,所述第一业务系统调用所述第二业务系统执行的业务步骤;a receiving unit, configured to receive a processing log corresponding to the service request identifier ID; the service request is sent when the first service system executes the service, where the service is coordinated by the first service system and the second service system where the calling relationship exists Executing a service, the processing log is used to record an execution result of each service step corresponding to the service request ID, where each service step includes: a service step performed by the first service system, and the first The business system invokes a business step performed by the second service system;
    确定单元,用于根据所述处理日志中的所述执行结果确定异常业务步骤;a determining unit, configured to determine an abnormal service step according to the execution result in the processing log;
    定位单元,用于将用于执行所述异常业务步骤的业务系统定位为故障业务系统。And a positioning unit, configured to locate a service system for performing the abnormal service step as a faulty service system.
  15. 根据权利要求14所述的装置,其特征在于,所述处理日志包括:第一处理日志和第二处理日志;所述确定单元,还用于:The device according to claim 14, wherein the processing log comprises: a first processing log and a second processing log; and the determining unit is further configured to:
    根据第一处理日志中的所述执行结果确定内部业务步骤是否为所述异常业务步骤;所述第一处理日志用于记录所述第一业务系统执行与所述业务请求ID对应的所述内部业务步骤的执行结果;Determining, according to the execution result in the first processing log, whether the internal service step is the abnormal service step; the first processing log is used to record that the first service system executes the internal corresponding to the service request ID The result of the execution of the business steps;
    根据第二处理日志中的所述执行结果确定外部业务步骤是否为所述异常业务步骤;所述第二处理日志用于记录在调用所述第二业务系统执行与所述业务请求ID对应的所述外部业务步骤的执行结果。Determining, according to the execution result in the second processing log, whether the external service step is the abnormal service step; the second processing log is configured to record, in the calling the second service system, the location corresponding to the service request ID The result of the execution of the external business steps.
  16. 根据权利要求15所述的装置,其特征在于,所述定位单元,还用于:The device according to claim 15, wherein the positioning unit is further configured to:
    在所述内部业务步骤是所述异常业务步骤时,将所述第一业务系统定位为所述故障业务系统;And when the internal service step is the abnormal service step, positioning the first service system as the faulty service system;
    在所述外部业务步骤是所述异常业务步骤时,将被调用的所述第二业务系统定位为所述故障业务系统。When the external service step is the abnormal service step, the called second service system is located as the faulty service system.
  17. 根据权利要求16所述的装置,其特征在于,所述第一业务系统包括:具有第一应用编程接口API的第一处理模块,所述第一API具有对应的第一API标识;所述第二业务系统包括:具有第二API的第二处理模块,所述第二API具有对应的第二API标识;所述定位单元,还用于:The apparatus according to claim 16, wherein the first service system comprises: a first processing module having a first application programming interface API, the first API having a corresponding first API identifier; The second service system includes: a second processing module having a second API, the second API having a corresponding second API identifier; the positioning unit is further configured to:
    在所述故障业务系统为所述第一业务系统时,根据所述第一处理日志中包含的所述第一API标识,将所述第一API标识对应的API定位为故障API;所述第一处理日志包括:所述业务请求ID、第一业务系统ID、所述第一API标识和结果码,所述结果码是指 所述第一处理模块执行所述内部业务步骤的执行结果;When the faulty service system is the first service system, the API corresponding to the first API identifier is located as a fault API according to the first API identifier included in the first processing log; A processing log includes: the service request ID, a first service system ID, the first API identifier, and a result code, where the result code refers to The first processing module executes an execution result of the internal service step;
    在所述故障业务系统为被调用的所述第二业务系统时,根据所述第二处理日志中包含的所述第二API标识,将所述第二API标识对应的API定位为所述故障API;所述第二处理日志包括:所述业务请求ID、所述第一业务系统ID、所述第一API标识、第二业务系统ID、所述第二API标识和返回码,所述返回码是指在调用所述第二处理模块执行所述外部业务步骤的执行结果。When the faulty service system is the called second service system, the API corresponding to the second API identifier is located as the fault according to the second API identifier included in the second processing log. The second processing log includes: the service request ID, the first service system ID, the first API identifier, the second service system ID, the second API identifier, and a return code, and the return The code refers to the execution result of the execution of the external service step by calling the second processing module.
  18. 根据权利要求17所述的装置,其特征在于,所述装置,还包括:The device according to claim 17, wherein the device further comprises:
    获取单元,用于获取与所述业务请求ID对应的业务流程模型,所述业务流程模型包括:与所述业务请求ID对应的各个业务步骤的执行顺序;An obtaining unit, configured to acquire a business process model corresponding to the service request ID, where the business process model includes: an execution sequence of each service step corresponding to the service request ID;
    所述获取单元,还用于根据所述执行顺序依次获取与各个业务步骤对应的n个第一处理日志和m个第二处理日志,所述n和所述m分别为正整数。The acquiring unit is further configured to sequentially acquire n first processing logs and m second processing logs corresponding to the respective service steps according to the execution order, where the n and the m are positive integers, respectively.
  19. 根据权利要求18所述的装置,其特征在于,所述确定单元,还用于根据第i个第一处理日志中的执行结果确定所述内部业务步骤是否为所述异常业务步骤,所述i为小于等于n的正整数;The apparatus according to claim 18, wherein the determining unit is further configured to determine, according to an execution result in the i th first processing log, whether the internal service step is the abnormal service step, where the i Is a positive integer less than or equal to n;
    所述定位单元,还用于:The positioning unit is further configured to:
    若是所述异常业务步骤,则将所述第i个第一处理日志中包括的第一API标识对应的API定位为所述故障API;If the abnormal service step is performed, the API corresponding to the first API identifier included in the i-th first processing log is located as the fault API;
    若不是所述异常业务步骤,则令i=i+1,再次执行所述根据所述第i个第一处理日志中的执行结果确定是否为所述异常业务步骤的步骤。If it is not the abnormal service step, let i=i+1, and perform the step of determining whether the abnormal service step is the step according to the execution result in the i-th first processing log.
  20. 根据权利要求18所述的装置,其特征在于,所述确定单元,还用于根据第j个第二处理日志中的执行结果确定所述外部业务步骤是否为所述异常业务步骤,所述j为小于等于m的正整数;The apparatus according to claim 18, wherein the determining unit is further configured to determine, according to an execution result in the jth second processing log, whether the external service step is the abnormal service step, the j Is a positive integer less than or equal to m;
    所述定位单元,还用于:The positioning unit is further configured to:
    若是所述异常业务步骤,则将所述第j个第二处理日志中包括的第二API标识对应的API定位为所述故障API;If the abnormal service step is performed, the API corresponding to the second API identifier included in the jth second processing log is located as the fault API;
    若不是所述异常业务步骤,则令j=j+1,再次执行所述根据所述第j个第二处理日志中的执行结果确定所述外部业务步骤是否为所述异常业务步骤的步骤。 If it is not the abnormal service step, let j=j+1, and perform the step of determining whether the external service step is the abnormal service step according to the execution result in the jth second processing log.
PCT/CN2017/081072 2016-09-06 2017-04-19 Fault localization platform, fault localization method and device WO2018045756A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610803984.6 2016-09-06
CN201610803984.6A CN106254144B (en) 2016-09-06 2016-09-06 Fault positioning platform, fault positioning method and device

Publications (1)

Publication Number Publication Date
WO2018045756A1 true WO2018045756A1 (en) 2018-03-15

Family

ID=57599315

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/081072 WO2018045756A1 (en) 2016-09-06 2017-04-19 Fault localization platform, fault localization method and device

Country Status (2)

Country Link
CN (1) CN106254144B (en)
WO (1) WO2018045756A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111078447A (en) * 2019-11-24 2020-04-28 杭州安恒信息技术股份有限公司 Method, device, equipment and medium for positioning abnormity in micro-service architecture
CN111143304A (en) * 2019-11-20 2020-05-12 杭州端点网络科技有限公司 Micro-service system abnormal log analysis method based on request link
CN112463561A (en) * 2020-11-20 2021-03-09 中国建设银行股份有限公司 Fault positioning method, device, equipment and storage medium

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106254144B (en) * 2016-09-06 2020-02-14 华为技术有限公司 Fault positioning platform, fault positioning method and device
CN108733698B (en) * 2017-04-19 2023-08-08 腾讯科技(深圳)有限公司 Log message processing method and background service system
CN107248927B (en) * 2017-05-02 2020-06-09 华为技术有限公司 Generation method of fault positioning model, and fault positioning method and device
CN109218041B (en) * 2017-06-29 2022-03-11 北京京东尚科信息技术有限公司 Request processing method and device for server system
CN107657425A (en) * 2017-09-18 2018-02-02 泰康保险集团股份有限公司 Business flow processing method and device, computer-readable medium, electronic equipment
CN108173706B (en) * 2017-11-29 2020-06-19 阿里巴巴集团控股有限公司 Service marking method, device and equipment under multi-service system
CN108768752B (en) * 2018-06-25 2021-12-03 华为技术有限公司 Fault positioning method, device and system
CN108847989B (en) * 2018-06-29 2021-07-06 杭州安恒信息技术股份有限公司 Log processing method based on micro-service architecture, service system and electronic equipment
CN109739680A (en) * 2019-02-02 2019-05-10 广州视源电子科技股份有限公司 Trouble shoot method, apparatus, equipment and the medium of application system
CN110943858B (en) * 2019-11-21 2022-07-12 中国联合网络通信集团有限公司 Fault positioning method and device
CN110932918B (en) * 2019-12-26 2023-01-10 远景智能国际私人投资有限公司 Log data acquisition method and device and storage medium
CN111488289B (en) * 2020-04-26 2024-01-23 支付宝实验室(新加坡)有限公司 Fault positioning method, device and equipment
CN114363144B (en) * 2020-09-28 2023-06-27 华为技术有限公司 Fault information association reporting method and related equipment for distributed system
CN114726714A (en) * 2022-02-18 2022-07-08 珠海紫讯信息科技有限公司 WebRPA operation and maintenance method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1866869A (en) * 2006-02-17 2006-11-22 华为技术有限公司 Service network tracking system and method
US20150026238A1 (en) * 2013-07-18 2015-01-22 Netapp, Inc. System and method for managing event tracking
CN105207806A (en) * 2015-08-20 2015-12-30 百度在线网络技术(北京)有限公司 Monitoring method and apparatus of distributed service
CN105577454A (en) * 2016-03-03 2016-05-11 上海新炬网络信息技术有限公司 Method for quickly positioning service fault based on log
CN106254144A (en) * 2016-09-06 2016-12-21 华为技术有限公司 Fault location platform, Fault Locating Method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105391772B (en) * 2015-10-16 2019-02-22 百度在线网络技术(北京)有限公司 Service request processing method, log processing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1866869A (en) * 2006-02-17 2006-11-22 华为技术有限公司 Service network tracking system and method
US20150026238A1 (en) * 2013-07-18 2015-01-22 Netapp, Inc. System and method for managing event tracking
CN105207806A (en) * 2015-08-20 2015-12-30 百度在线网络技术(北京)有限公司 Monitoring method and apparatus of distributed service
CN105577454A (en) * 2016-03-03 2016-05-11 上海新炬网络信息技术有限公司 Method for quickly positioning service fault based on log
CN106254144A (en) * 2016-09-06 2016-12-21 华为技术有限公司 Fault location platform, Fault Locating Method and device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143304A (en) * 2019-11-20 2020-05-12 杭州端点网络科技有限公司 Micro-service system abnormal log analysis method based on request link
CN111143304B (en) * 2019-11-20 2023-09-29 杭州端点网络科技有限公司 Micro-service system exception log analysis method based on request link
CN111078447A (en) * 2019-11-24 2020-04-28 杭州安恒信息技术股份有限公司 Method, device, equipment and medium for positioning abnormity in micro-service architecture
CN111078447B (en) * 2019-11-24 2023-09-19 杭州安恒信息技术股份有限公司 Abnormality positioning method, device, equipment and medium in micro-service architecture
CN112463561A (en) * 2020-11-20 2021-03-09 中国建设银行股份有限公司 Fault positioning method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN106254144A (en) 2016-12-21
CN106254144B (en) 2020-02-14

Similar Documents

Publication Publication Date Title
WO2018045756A1 (en) Fault localization platform, fault localization method and device
CN107689953B (en) Multi-tenant cloud computing-oriented container security monitoring method and system
US10073683B2 (en) System and method for providing software build violation detection and self-healing
US20190012365A1 (en) User request processing method and device
CN105447046A (en) Distributed system data consistency processing method, device and system
US20220058104A1 (en) System and method for database replication benchmark testing using a pipeline-based microservices model
CN104809202A (en) Database synchronization method and device
CN104809201A (en) Database synchronization method and device
US9652353B2 (en) Monitoring business transaction failures involving database procedure calls
GB2578077A (en) Multi-tenant data service in distributed file systems for big data analysis
CN105677465B (en) The data processing method and device of batch processing are run applied to bank
CN104809200A (en) Database synchronization method and device
US10362097B1 (en) Processing an operation with a plurality of processing steps
JP6975153B2 (en) Data storage service processing method and equipment
CN108140035B (en) Database replication method and device for distributed system
US20120005330A1 (en) Organizing Individual Java Client Request Flows into a Single Server Transaction
WO2020253045A1 (en) Configured supplementary processing method and device for data of which forwarding has abnormality, and readable storage medium
US20200371902A1 (en) Systems and methods for software regression detection
CN114090113A (en) Method, device and equipment for dynamically loading data source processing plug-in and storage medium
CN111522881B (en) Service data processing method, device, server and storage medium
CN107644041B (en) Policy settlement processing method and device
WO2019117767A1 (en) Method, function manager and arrangement for handling function calls
CN108255704B (en) Abnormal response method of script calling event and terminal thereof
CN111435356A (en) Data feature extraction method and device, computer equipment and storage medium
US11360785B2 (en) Execution path determination in a distributed environment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17847930

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17847930

Country of ref document: EP

Kind code of ref document: A1