CN106980572B - Online debugging method and system for distributed system - Google Patents

Online debugging method and system for distributed system Download PDF

Info

Publication number
CN106980572B
CN106980572B CN201610035223.0A CN201610035223A CN106980572B CN 106980572 B CN106980572 B CN 106980572B CN 201610035223 A CN201610035223 A CN 201610035223A CN 106980572 B CN106980572 B CN 106980572B
Authority
CN
China
Prior art keywords
debugging
debugging information
distributed
information
online
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610035223.0A
Other languages
Chinese (zh)
Other versions
CN106980572A (en
Inventor
马涛
郑旭
杨兵兵
陈生栋
李渭民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Tmall Technology Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610035223.0A priority Critical patent/CN106980572B/en
Publication of CN106980572A publication Critical patent/CN106980572A/en
Application granted granted Critical
Publication of CN106980572B publication Critical patent/CN106980572B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3644Software debugging by instrumenting at runtime

Abstract

The application provides an online debugging method and system of a distributed system, wherein the method comprises the following steps: the ith distributed node receives a debugging information collecting instruction, wherein the debugging information collecting instruction comprises a collecting identifier, and i is a positive integer; the ith distributed node enters an online debugging mode according to the debugging information collecting instruction and collects debugging information; the ith distributed node sends debugging information to the server, wherein the debugging information has a number corresponding to the debugging information collecting instruction; and the ith distributed node sends the debugging information collection instruction to the (i + 1) th distributed node. According to the online debugging method of the distributed system, the difficulty in collecting the debugging logs is reduced, and the log collecting efficiency is improved, so that debugging personnel can conveniently perform integral analysis on debugging information, and the debugging efficiency can be improved.

Description

Online debugging method and system for distributed system
Technical Field
The present application relates to the field of online debugging technologies, and in particular, to an online debugging method and system for a distributed system.
Background
In a traditional distributed system, a log mode is mainly written through a local file system of each distributed node, and the method has the following problems for analyzing and solving the problems of the distributed system:
1. because the traditional log system stores the logs on the local machine of each distributed node, and under the strict authority requirement of the distributed system, not all people have the authority to access the logs stored in each distributed node, the problem of difficult log access exists.
2. Because the distributed system may experience a plurality of modules deployed at different distributed nodes in the response process of one application, and each module independently generates and stores the log of its own module, the logs of the modules experienced in the response process of one application are independently stored, so that a debugger is required to search the logs from different modules respectively during debugging, which is very inconvenient. In addition, due to the parallel characteristic of the distributed system, the context correlation of the logs on different modules is not obvious, which also brings great difficulty to debugging.
3. Each module in the distributed system serves the outside in the form of an independent cluster, and the inside of the cluster consists of hundreds of thousands of machines. Because of load balancing, the machines that respond to a request at a time may be different, which makes it difficult and inefficient for the debugger to collect logs associated with requests.
4. The distributed system needs to process hundreds of requests every second, and each request outputs the internal operation state of the system to the log, so that the content of the log is expanded quickly, and therefore, the log of a certain request is difficult to distinguish. In addition, in the conventional log, a plurality of processes can write the log file at the same time, which also causes that the log is filled with logs which are written by other processes and are irrelevant to the response, so that the log of the request is more inconvenient to search, and the analysis efficiency is lower.
Disclosure of Invention
The present application aims to address the above technical problem, at least to some extent.
Therefore, a first objective of the present application is to provide an online debugging method for a distributed system, which reduces the difficulty of collecting a debugging log and improves the log collecting efficiency.
A second objective of the present application is to provide an online debugging system for a distributed system.
To achieve the above object, an embodiment according to a first aspect of the present application provides an online debugging method for a distributed system, including the following steps: the distributed node I receives a debugging information collecting instruction, wherein the debugging information collecting instruction comprises a collecting identifier, and i is a positive integer; the ith distributed node enters an online debugging mode according to the debugging information collecting instruction and collects debugging information; the ith distributed node sends the debugging information to a server, wherein the debugging information has a number corresponding to the debugging information collecting instruction; and the ith distributed node sends the debugging information collection instruction to the (i + 1) th distributed node.
According to the online debugging method of the distributed system, the distributed nodes in the distributed system receive the debugging information collecting instruction including the collecting identification, enter the online debugging mode, collect the debugging information and send the debugging information to the server, and the debugging information can be collected in a targeted mode through the identification function of the collecting identification, so that the debugging information is prevented from being mixed up with other running log information, and the expansion degree of logs can be greatly reduced. In addition, the debugging information has the number corresponding to the debugging information collecting instruction, so that the debugging information corresponding to different debugging information collecting instructions can be distinguished according to the number, and the collection, the search and the association of the debugging information are facilitated. Therefore, the debugging log collection difficulty is reduced, the log collection efficiency is improved, debugging personnel can conveniently perform overall analysis on debugging information, and the debugging efficiency can be improved.
An embodiment of a second aspect of the present application provides an online debugging system for a distributed system, including: the system comprises a plurality of distributed nodes and a server, wherein the distributed nodes are used for receiving a debugging information collecting instruction, the debugging information collecting instruction comprises a collecting identifier, enters an online debugging mode according to the debugging information collecting instruction, collects debugging information and sends the debugging information to the server, the debugging information has a number corresponding to the debugging information collecting instruction, and the debugging information collecting instruction is sent to the next distributed node; the server is used for receiving the debugging information sent by the distributed nodes.
The online debugging system of the distributed system, which is provided by the embodiment of the application, can receive the debugging information collection instruction including the collection identifier by the distributed nodes in the distributed system, enter the online debugging mode, collect the debugging information, send the debugging information to the server, and can collect the debugging information in a targeted manner under the action of the collected identifier, so that the debugging information is prevented from being confused with other operation log information, and the expansion degree of the log can be greatly reduced. In addition, the debugging information has the number corresponding to the debugging information collecting instruction, so that the debugging information corresponding to different debugging information collecting instructions can be distinguished according to the number, and the collection, the search and the association of the debugging information are facilitated. Therefore, the debugging log collection difficulty is reduced, the log collection efficiency is improved, debugging personnel can conveniently perform overall analysis on debugging information, and the debugging efficiency can be improved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The above and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow diagram of a method for online debugging of a distributed system according to one embodiment of the present application;
FIG. 2 is a flow diagram of a method for online debugging of a distributed system according to another embodiment of the present application;
FIG. 3 is a schematic diagram of online debugging of a distributed system according to another embodiment of the present application;
fig. 4 is a schematic structural diagram of an online debugging system of a distributed system according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
An online debugging method and system of a distributed system according to an embodiment of the present application are described below with reference to the accompanying drawings.
Fig. 1 is a flowchart of an online debugging method of a distributed system according to an embodiment of the present application.
As shown in fig. 1, an online debugging method of a distributed system according to an embodiment of the present application includes:
s101, the ith distributed node receives a debugging information collecting instruction, wherein the debugging information collecting instruction comprises a collecting identifier, and i is a positive integer.
The distributed system is composed of a plurality of distributed nodes. Each distributed node may be used to process requests sent by the server. The client user can send a service request or an online debugging request of the application program to the server according to the processing requirement. For example, for a service request, the service request may be processed by allocating corresponding nodes according to the idleness of each distributed node. And for the online debugging request, a debugging information collection instruction can be sent to the node needing to collect the debugging information according to the request of the client.
The server can send corresponding instructions to corresponding distributed nodes in the distributed system according to the request sent by the client, so that the request of the client is processed through each distributed node. The server sends a corresponding service processing instruction to the distributed system for the service request of the client; and sending a corresponding debugging information collection instruction to the distributed system for the online debugging request of the client.
In the embodiment of the application, in order to distinguish the service request and the online debugging request of the application program, when the server sends the debugging information collecting instruction, the server may set a collecting identifier for the debugging information collecting instruction. Therefore, each distributed node can judge whether the received request is a debugging information collecting instruction according to whether the received request comprises the collecting identification, and can collect logs corresponding to the debugging information collecting instruction only subsequently, so that the method is more convenient and improves the information collecting efficiency.
In an embodiment of the present application, the server may send the debug information collection instruction to a corresponding distributed node in the distributed system according to a request of the client to perform debugging, and each distributed node may send the debug information collection instruction to a next distributed node in a debugging process or when the debugging is completed, so as to debug the next distributed node. Therefore, the debug information collection instruction received by the ith distributed node may be sent by the server or sent by the (i-1) th distributed node.
And S102, the ith distributed node enters an online debugging mode according to the debugging information collecting instruction and collects debugging information.
When the ith distributed node receives the debugging information collection instruction, the distributed node can enter an online debugging mode, namely, the operation carried out according to the current instruction is online debugging, and then the current processing process can be recorded to obtain debugging information.
Specifically, in the embodiment of the present Application, a Trace API (Trace Application Programming Interface) is set in each distributed node, and the distributed node may collect the debugging information through the Trace API.
S103, the ith distributed node sends debugging information to the server, wherein the debugging information has a number corresponding to the debugging information collecting instruction.
After the ith distributed node collects the debugging information, the collected debugging information can be sent to the server, so that the server collects the debugging information corresponding to the instruction according to the same debugging information received by other distributed nodes.
Wherein the number may be a character string of a preset length. The number may be, for example, a 64-bit numeric string. Each serial number corresponds to one-time online debugging and is used for uniquely identifying abnormal debugging information. That is, the debug information in any distributed node has the same number as long as the debug information is debugged online at the same time. Therefore, debugging information corresponding to the same debugging information collecting instruction on different distributed nodes can be associated according to the serial numbers of the debugging information, and debugging personnel can conveniently carry out integral analysis on the debugging information.
And S104, the ith distributed node sends the debugging information collection instruction to the (i + 1) th distributed node.
After completing the debugging, the ith distributed node may send a debugging information collection instruction to its next distributed node (i.e., the (i + 1) th distributed node) to control the (i + 1) th distributed node to collect the debugging information.
According to the online debugging method of the distributed system, the distributed nodes in the distributed system receive the debugging information collecting instruction including the collecting identification, enter the online debugging mode, collect the debugging information and send the debugging information to the server, and the debugging information can be collected in a targeted mode through the identification function of the collecting identification, so that the debugging information is prevented from being mixed up with other running log information, and the expansion degree of logs can be greatly reduced. In addition, the debugging information has the number corresponding to the debugging information collecting instruction, so that the debugging information corresponding to different debugging information collecting instructions can be distinguished according to the number, and the collection, the search and the association of the debugging information are facilitated. Therefore, the debugging log collection difficulty is reduced, the log collection efficiency is improved, debugging personnel can conveniently perform overall analysis on debugging information, and the debugging efficiency can be improved.
Further, fig. 2 is a flowchart of an online debugging method of a distributed system according to another embodiment of the present application.
As shown in fig. 2, the online debugging method of the distributed system according to the present application includes steps S201 to S204, where the steps S201 to S204 are the same as steps S101 to S104 in fig. 1, and further, may further include steps S205 to S209.
S205, the (i + 1) th distributed node receives a debugging information collecting instruction, wherein the debugging information collecting instruction comprises a collecting identification.
S206, the (i + 1) th distributed node enters an online debugging mode according to the debugging information collecting instruction and collects debugging information.
And S207, the (i + 1) th distributed node sends debugging information to the server, wherein the debugging information has a number corresponding to the debugging information collecting instruction.
S208, the ith distributed node sends the debugging information collection instruction to the (i + 2) th distributed node.
S209, the server receives the debugging information sent by the distributed nodes, and summarizes the debugging information according to the numbers in the debugging information to generate a debugging log.
After the distributed nodes which need to collect the debugging information, the collected debugging information can be sent to the server. The server can receive debugging information sent by each distributed node and collect the debugging information according to the number in the debugging information. Specifically, the debugging information with the same number may be merged to generate a debugging log corresponding to the number.
In the embodiment of the application, the server may have an access interface, and the client may call the debug log in the server through the access interface. Specifically, the client may receive a number corresponding to a debugging information collecting instruction input by a user, and call a corresponding debugging log in the server according to the number.
In one embodiment of the present application, the debug log is in a markup extensible language XML format. Therefore, the client can conveniently read the debugging log from the server and then perform structured display, analysis is facilitated, and debugging efficiency is further improved.
The online debugging method of the distributed system of the present embodiment is described below through the following application scenarios. As shown in fig. 3, the distributed System of the server side is composed of six nodes, that is, an FE (Front End), a target (business logic Node), a QR (Query Rewrite, Query Rewrite Node), a DN (Data Node), an SN (Search Node), and an ORS (Online Ranking System). Each node (ORS, SN and Merger) which needs to collect debugging information can collect the debugging information through a Trace API, and after the debugging is completed, the collected debugging information is sent to a Trace Server (tracking Server) through a network.
Specifically, when receiving an instruction, the Merger node can determine whether the instruction includes a collection identifier, and if so, the Merger node enters an online debugging mode, collects debugging information (Merger Trace information), and sends the debugging information (Merger Trace information) to the Trace Server. After judging that the collection identifier is included, or after the collection of the debugging information is completed, the Merger node can send an instruction to the SN node. The SN node can judge whether the instruction comprises a collection identifier or not, if so, the SN node enters an online debugging mode, collects debugging information (SN Trace information) and sends the debugging information (SN Trace information) to the Trace Server. After determining that the collection identifier is included, or after the collection of the debug information is completed, the SN node may send an instruction to the ORS node. The ORS node can judge whether the instruction includes a collection identifier, if so, the ORS node enters an online debugging mode, collects debugging information (ORS Trace information) and sends the debugging information to the Trace Server.
The Trace Server can combine the debugging information of the ORS, the SN and the Merger to generate a complete debugging log corresponding to the one-time debugging request for storage. The client can read the corresponding debugging log stored in the server according to the number input by the User through a User Interface (UI), analyze the debugging log and structurally display the debugging log after the analysis.
In the embodiment of the application, the nodes for collecting the debugging information can respectively send the debugging information to the server after respective debugging information is collected, so that the possibility of blocking is avoided, and the debugging information is not returned together with the processing results of other processing requests, so that the response time of the processing results of the processing requests is not influenced, the processing results of other processing requests are not invaded, sensitive information in the server is not easy to leak, and the data security is improved.
According to the online debugging method of the distributed system, the debugging information corresponding to different debugging information collecting instructions can be distinguished according to the serial numbers, and each distributed node can send the collected debugging information to the server, so that the server collects the debugging information according to the serial numbers of the debugging information to generate the debugging log, the debugging log collecting difficulty is reduced, the log collecting efficiency is improved, debugging personnel can conveniently perform integral analysis on the debugging information, and the debugging efficiency can be improved.
Corresponding to the online debugging method of the distributed system provided by the above embodiment, the present application also provides an online debugging system of the distributed system.
Fig. 4 is a schematic structural diagram of an online debugging system of a distributed system according to an embodiment of the present application.
As shown in fig. 4, an online debugging system of a distributed system according to an embodiment of the present application includes: a plurality of distributed nodes 10 and a server 20.
Specifically, the distributed node 10 is configured to receive a debug information collection instruction, where the debug information collection instruction includes a collection identifier, enter an online debug mode according to the debug information collection instruction, collect debug information, and send the debug information to the server 20, where the debug information has a number corresponding to the debug information collection instruction, and send the debug information collection instruction to a next distributed node.
The server 20 is used for receiving the debugging information sent by the distributed nodes.
Wherein a plurality of distributed nodes 10 belong to the same distributed system. Each distributed node 10 may be used to process requests sent by the server. The client user may send a service request or an online debugging request of the application program to the server 20 according to the processing requirement. For example, for a service request, the service request may be processed by allocating corresponding nodes according to the idleness of each distributed node. And for the online debugging request, a debugging information collection instruction can be sent to the node needing to collect the debugging information according to the request of the client.
Wherein the number may be a character string of a preset length. The number may be, for example, a 64-bit numeric string. Each serial number corresponds to one-time online debugging and is used for uniquely identifying abnormal debugging information. That is, the debug information in any distributed node has the same number as long as the debug information is debugged online at the same time. Therefore, debugging information corresponding to the same debugging information collecting instruction on different distributed nodes can be associated according to the serial numbers of the debugging information, and debugging personnel can conveniently carry out integral analysis on the debugging information.
The server 20 may send corresponding instructions to corresponding distributed nodes 10 in the distributed system according to the request sent by the client, so as to process the request of the client through each distributed node 10. The online debugging request is different from the service request, and the server 20 sends a corresponding service processing instruction to the distributed system for the service request of the client; and sending a corresponding debugging information collection instruction to the distributed system for the online debugging request of the client.
In the embodiment of the present application, in order to distinguish the service request and the online debugging request of the application program, the server 20 may set a collection identifier for the debugging information collecting instruction when sending the debugging information collecting instruction. Therefore, each distributed node 10 can judge whether the received request includes the collection identifier or not, and subsequently can collect only the log corresponding to the debugging information collection instruction, which is more convenient and improves the information collection efficiency.
In an embodiment of the present application, the server 20 may send the debugging information collecting instruction to the corresponding distributed node 10 in the distributed system according to a request of the client for debugging, and each distributed node 10 may send the debugging information collecting instruction to the next distributed node during the debugging process or when the debugging is completed, so as to debug the next distributed node.
Therefore, the debug information collection instruction received by the ith distributed node may be sent by the server or sent by the (i-1) th distributed node. When the ith distributed node receives the debugging information collection instruction, the distributed node can enter an online debugging mode, namely, the operation carried out according to the current instruction is online debugging, and then the current processing process can be recorded to obtain debugging information. After the ith distributed node collects the debug information, the collected debug information may be sent to the server 20, so that the server 20 collects the debug information corresponding to the instruction according to the same debug information received by other distributed nodes.
After completing the debugging, the ith distributed node may send a debugging information collection instruction to its next distributed node (i.e., the (i + 1) th distributed node) to control the (i + 1) th distributed node to collect the debugging information. Therefore, each distributed node needing to collect debugging information can be controlled to collect debugging information in turn. And each distributed node that needs to collect debug information sends the collected debug information to server 20. The server 20 may receive the debug information sent by each distributed node, and summarize the debug information according to the numbers in the debug information. Specifically, the server 20 may combine the debug information with the same number to generate a debug log corresponding to the number.
In an embodiment of the present application, the server 20 may have an access interface through which the client may call up the debug log in the server. Specifically, the client may receive a number corresponding to a debugging information collecting instruction input by a user, and call a corresponding debugging log in the server according to the number.
In one embodiment of the present application, the debug log is in a markup extensible language XML format. Therefore, the client can conveniently read 20 the debugging log from the server and then perform structured display, analysis is facilitated, and debugging efficiency is further improved.
The online debugging method of the distributed system of the present embodiment is described below through the following application scenarios. As shown in fig. 3, the distributed System of the server side is composed of six nodes, that is, an FE (Front End), a target (business logic Node), a QR (Query Rewrite, Query Rewrite Node), a DN (Data Node), an SN (Search Node), and an ORS (Online Ranking System). Each node (ORS, SN and Merger) which needs to collect debugging information can collect the debugging information through a Trace API, and after the debugging is completed, the collected debugging information is sent to a Trace Server (tracking Server) through a network.
Specifically, when receiving an instruction, the Merger node can judge whether the instruction includes a collection identifier, and if so, the Merger node enters an online debugging mode, collects debugging information and sends the debugging information to the Trace Server. After judging that the collection identifier is included, or after the collection of the debugging information is completed, the Merger node can send an instruction to the SN node. The SN node can judge whether the instruction comprises a collection identifier or not, if so, the SN node enters an online debugging mode, collects debugging information and sends the debugging information to the Trace Server. After determining that the collection identifier is included, or after the collection of the debug information is completed, the SN node may send an instruction to the ORS node. The ORS node can judge whether the instruction comprises a collection identifier, if so, the ORS node enters an online debugging mode, collects debugging information and sends the debugging information to the Trace Server.
The Trace Server can combine the debugging information of the ORS, the SN and the Merger to generate a complete debugging log corresponding to the one-time debugging request for storage. The client can read the corresponding debugging log stored in the server according to the number input by the User through a User Interface (UI), analyze the debugging log and structurally display the debugging log after the analysis.
In the embodiment of the application, the nodes for collecting the debugging information can respectively send the debugging information to the server after respective debugging information is collected, so that the possibility of blocking is avoided, and the debugging information is not returned together with the processing results of other processing requests, so that the response time of the processing results of the processing requests is not influenced, the processing results of other processing requests are not invaded, sensitive information in the server is not easy to leak, and the data security is improved.
The online debugging system of the distributed system, which is provided by the embodiment of the application, can receive the debugging information collection instruction including the collection identifier by the distributed nodes in the distributed system, enter the online debugging mode, collect the debugging information, send the debugging information to the server, and can collect the debugging information in a targeted manner under the action of the collected identifier, so that the debugging information is prevented from being confused with other operation log information, and the expansion degree of the log can be greatly reduced. In addition, the debugging information has the number corresponding to the debugging information collecting instruction, so that the debugging information corresponding to different debugging information collecting instructions can be distinguished according to the number, and the collection, the search and the association of the debugging information are facilitated. Therefore, the debugging log collection difficulty is reduced, the log collection efficiency is improved, debugging personnel can conveniently perform overall analysis on debugging information, and the debugging efficiency can be improved.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present application have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the application, the scope of which is defined by the claims and their equivalents.

Claims (9)

1. An online debugging method of a distributed system is characterized by comprising the following steps:
the distributed node I receives a debugging information collecting instruction, wherein the debugging information collecting instruction comprises a collecting identifier, and i is a positive integer;
the ith distributed node enters an online debugging mode according to the debugging information collecting instruction and collects debugging information;
the ith distributed node sends the debugging information to a server, wherein the debugging information has a number corresponding to the debugging information collecting instruction, and each number corresponds to one-time online debugging and is used for uniquely identifying abnormal debugging information;
the ith distributed node sends the debugging information collecting instruction to an (i + 1) th distributed node;
and the server receives debugging information sent by the distributed nodes and collects the debugging information according to the serial numbers in the debugging information to generate a debugging log.
2. The method for online debugging of a distributed system according to claim 1, further comprising:
the (i + 1) th distributed node receives a debugging information collection instruction, wherein the debugging information collection instruction comprises a collection identifier;
the (i + 1) th distributed node enters an online debugging mode according to the debugging information collecting instruction and collects debugging information;
the (i + 1) th distributed node sends the debugging information to a server, wherein the debugging information has a number corresponding to the debugging information collecting instruction;
and the ith distributed node sends the debugging information collection instruction to the (i + 2) th distributed node.
3. The online debugging method of the distributed system according to claim 1, wherein the debugging information collection instruction received by the ith distributed node is transmitted by the server or transmitted by an ith-1 distributed node.
4. An online debugging method for a distributed system according to claim 1, wherein the server has an access interface for retrieving the debugging log through the access interface.
5. The online debugging method of the distributed system of claim 1 wherein the debug log is in a markup extensible language (XML) format.
6. An online debugging system for a distributed system, comprising: a plurality of distributed nodes and servers, wherein,
the distributed nodes are used for receiving a debugging information collecting instruction, wherein the debugging information collecting instruction comprises a collecting identifier, enters an online debugging mode according to the debugging information collecting instruction, collects debugging information and sends the debugging information to the server, the debugging information has a number corresponding to the debugging information collecting instruction, and the debugging information collecting instruction is sent to the next distributed node; each serial number corresponds to one-time online debugging and is used for uniquely identifying abnormal debugging information;
the server is used for receiving the debugging information sent by the distributed nodes and summarizing according to the serial numbers in the debugging information to generate a debugging log.
7. The online debugging system of the distributed system according to claim 6, wherein the debugging information collection instruction received by the distributed node is sent by the server or sent by a last distributed node of the distributed nodes.
8. The online debugging system of the distributed system of claim 6 wherein the server has an access interface for retrieving the debug log through the access interface.
9. The online debugging system of the distributed system of claim 6 wherein the debug log is in a markup extensible language (XML) format.
CN201610035223.0A 2016-01-19 2016-01-19 Online debugging method and system for distributed system Active CN106980572B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610035223.0A CN106980572B (en) 2016-01-19 2016-01-19 Online debugging method and system for distributed system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610035223.0A CN106980572B (en) 2016-01-19 2016-01-19 Online debugging method and system for distributed system

Publications (2)

Publication Number Publication Date
CN106980572A CN106980572A (en) 2017-07-25
CN106980572B true CN106980572B (en) 2021-03-02

Family

ID=59339857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610035223.0A Active CN106980572B (en) 2016-01-19 2016-01-19 Online debugging method and system for distributed system

Country Status (1)

Country Link
CN (1) CN106980572B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108365982A (en) * 2018-02-06 2018-08-03 北京小米移动软件有限公司 Unit exception adjustment method, device, equipment and storage medium
CN109408310B (en) * 2018-10-19 2022-02-18 网易(杭州)网络有限公司 Debugging method of server, server and readable storage medium
CN110018956B (en) * 2019-01-28 2022-05-13 创新先进技术有限公司 Application debugging method and related device
CN112559437A (en) * 2019-09-25 2021-03-26 阿里巴巴集团控股有限公司 Debugging unit and processor
CN112328491A (en) * 2020-11-18 2021-02-05 Oppo广东移动通信有限公司 Output method of trace message, electronic device and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040073658A1 (en) * 2002-10-10 2004-04-15 Cisco Technology, Inc. System and method for distributed diagnostics in a communication system
CN100461710C (en) * 2007-03-15 2009-02-11 华为技术有限公司 Distributed system journal collecting method and system
US7747742B2 (en) * 2008-06-27 2010-06-29 Microsoft Corporation Online predicate checking for distributed systems
CN103036961A (en) * 2012-12-07 2013-04-10 蓝盾信息安全技术股份有限公司 Distributed collection and storage method of journal
CN105119752A (en) * 2015-09-08 2015-12-02 北京京东尚科信息技术有限公司 Distributed log acquisition method, device and system

Also Published As

Publication number Publication date
CN106980572A (en) 2017-07-25

Similar Documents

Publication Publication Date Title
CN106980572B (en) Online debugging method and system for distributed system
US8543988B2 (en) Trace processing program, method and system
US8141053B2 (en) Call stack sampling using a virtual machine
CN109240886B (en) Exception handling method, exception handling device, computer equipment and storage medium
CN106294134B (en) The collapse localization method and device of code
US9354996B2 (en) System test apparatus
CN109710439B (en) Fault processing method and device
CN111563014A (en) Interface service performance test method, device, equipment and storage medium
CN102075368A (en) Method, device and system for diagnosing service failure
CN110674025A (en) Interactive behavior monitoring method and device and computer equipment
CN110287696A (en) A kind of detection method, device and the equipment of the shell process that rebounds
CN110674034A (en) Health examination method and device, electronic equipment and storage medium
KR101976629B1 (en) Commit sensitive tests
CN105515909B (en) A kind of data acquisition test method and apparatus
CN110795003B (en) Interface display method and device
CN115801372A (en) Link tracking method and device
CN108667740A (en) The method, apparatus and system of flow control
JP5495310B2 (en) Information processing apparatus, failure analysis method, and failure analysis program
CN110825609B (en) Service testing method, device and system
CN106528411A (en) Method and device for coverage rate detection and equipment
CN111784176A (en) Data processing method, device, server and medium
US9998341B2 (en) Method of constructing data collector, server performing the same and storage medium for the same
CN116016270A (en) Switch test management method and device, electronic equipment and storage medium
CN106227502A (en) A kind of method and device obtaining hard disk firmware version
CN105391602B (en) A kind of data acquisition test method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211116

Address after: Room 507, floor 5, building 3, No. 969, Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province

Patentee after: Zhejiang tmall Technology Co., Ltd

Address before: P.O. Box 847, 4th floor, Grand Cayman capital building, British Cayman Islands

Patentee before: Alibaba Group Holdings Limited