WO2020073200A1 - 调试程序的方法和系统 - Google Patents

调试程序的方法和系统 Download PDF

Info

Publication number
WO2020073200A1
WO2020073200A1 PCT/CN2018/109518 CN2018109518W WO2020073200A1 WO 2020073200 A1 WO2020073200 A1 WO 2020073200A1 CN 2018109518 W CN2018109518 W CN 2018109518W WO 2020073200 A1 WO2020073200 A1 WO 2020073200A1
Authority
WO
WIPO (PCT)
Prior art keywords
function
stack
thread
debugging
stack frame
Prior art date
Application number
PCT/CN2018/109518
Other languages
English (en)
French (fr)
Inventor
唐玮玮
沈灿泉
张丰伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2018/109518 priority Critical patent/WO2020073200A1/zh
Priority to CN201880097908.5A priority patent/CN112740187A/zh
Publication of WO2020073200A1 publication Critical patent/WO2020073200A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software

Definitions

  • This application relates to the computer field, and more specifically, to a method and system for debugging a program in the computer field.
  • Unified computing architecture compute unified device architecture, CUDA)-GDB (GNU debugger) is a NVIDA tool for running CUDA applications on Linux and Mac.
  • CUDA-GDB is a debugger based on the x86-64 version of the GDB project released by the GNU open source organization. This tool provides developers with a mechanism to debug CUDA applications running on actual hardware.
  • the CUDA program contains the host (host) and device (device) side code, these two parts of the code will run on different devices.
  • the code on the host side may be compiled and run on the X86 side
  • the code on the device side may be compiled and run on the graphics processing unit (GPU).
  • GPU graphics processing unit
  • the breakpoint set by the user is a breakpoint on the host side
  • the program hits the breakpoint on the host side during the running process
  • the current site contains the stack information on the host side.
  • the breakpoint set by the user is a device-side breakpoint
  • the program hits the device-side breakpoint during the running process, the current scene will only contain the device-side stack information.
  • the current heterogeneous debugging tools do not have the function of displaying the full stack. Especially in the linkage of heterogeneous tasks, developers cannot perceive the overall data flow of the business, resulting in increased debugging manpower.
  • the present application provides a method and system for debugging a program, which can record the end-to-end flow of services during the debugging process, and improve the efficiency of developers in locating problems.
  • a method for debugging a program including:
  • the first processing module executes a first function of the debugged program, wherein the stack frame of the first function includes a first identifier, and the first identifier is used to identify a thread to which the first function belongs;
  • the second processing module executes the second function, the stack frame of the second function includes the first identifier, and the first identifier is used to identify the second function The thread to which the function belongs, wherein the second function and the first function belong to the same thread;
  • the debugging agent module obtains scheduling information, and the scheduling information is used to instruct the first function to call the second function;
  • the debugging agent module sends a notification message to the debugging module, the notification message is used to notify the first function to call the second function;
  • the debugging module saves the current first stack frame of the first function to a stack buffer.
  • the stack frame is saved in the stack buffer, so that the stack frame of the first function saved in the stack buffer is associated with the thread to which the first function belongs, to realize the recording of the stack information of the same thread, based on this embodiment of the present application It can record the end-to-end flow of business during debugging, and improve the efficiency of developers in locating problems.
  • the first identifier is a thread identifier (thread_ID) of the thread to which the first function and the second function belong.
  • the method further includes:
  • the stack frame associated with the current thread in the stack buffer can be recorded, so that an intuitive business flowchart can be provided to enable the developer to debug In the process, the end-to-end flow of business can be obtained, and the efficiency of developers in locating problems can be improved.
  • the debugging module saves the current first stack frame of the first function to a stack buffer, including:
  • the debugging module adds the first stack frame to the stack linked list corresponding to the thread of the first function in the stack buffer.
  • the debugging module saves the current first stack frame of the first function to a stack buffer, including: the debugging module is in the stack buffer
  • the stack linked list corresponding to the thread of the first function is established in, and the initial node of the stack linked list is the first stack frame.
  • the stack linked list can be established.
  • the first processing module is a host-side processor
  • the second processing module is a device-side processor
  • the debugging agent module and the debugging module may be set on a processor, such as a CPU or a GPU.
  • the processor may be the processor where the first processing module is located, or the processor where the second processing module is located, or the debugging agent module and the debugging module may be set on separate processors, which is not done in the embodiments of the present application limited.
  • a system for debugging a program which is used to execute the method in the first aspect or any possible implementation manner of the first aspect.
  • the system for debugging a program includes a module for performing the method in the first aspect or any possible implementation manner of the first aspect.
  • a computer-readable medium for storing a computer program, the computer program including instructions for performing the method in any possible implementation manner of the first aspect described above.
  • a computer program product comprising: computer program code, when the computer program code is run by a processing module or processor in the system of the debug program, the debug program is caused
  • FIG. 1 shows a schematic diagram of a stack structure.
  • FIG. 2 shows a schematic diagram of a heterogeneous debugging framework provided by an embodiment of the present application.
  • FIG. 3 shows a schematic flowchart of a method for debugging a program provided by an embodiment of the present application.
  • FIG. 4 shows a schematic diagram of a stack structure provided by an embodiment of the present application.
  • FIG. 5 shows a schematic diagram of a stack buffer provided by an embodiment of the present application.
  • FIG. 6 shows a schematic diagram of a multi-thread service flow in the prior art.
  • FIG. 7 shows a schematic diagram of a multi-thread service flow provided by an embodiment of the present application.
  • FIG. 8 shows an example of a specific debugging program provided by the embodiment of the present application.
  • FIG. 9 shows a schematic flowchart of a method for displaying full stack information provided by an embodiment of the present application.
  • FIG. 10 shows an example of a specific debugging program provided by the embodiment of the present application.
  • FIG. 11 shows a schematic block diagram of a system for debugging a program provided by an embodiment of the present application.
  • FIG. 1 shows a schematic diagram of a stack structure.
  • the upper part of the stack is the stack frame of the main function
  • the lower part is the stack frame of the Func (function) 1 function, which is the stack frame of the current function (callee)
  • the bottom of the stack is at a high address, and the stack grows downward.
  • the stack frame refers to the part of the stack space allocated separately for a function call. For example, when the running program calls another function, it will enter a new stack frame. The original function's stack frame is called the caller's frame, and the new stack frame is called the current frame. After the called function finishes running, all the current frames shrink and return to the caller's frame.
  • the main function stack frame includes registers on the stack, functions on parameters, local variables, and parameters of the calling function.
  • the stack frame of the Func1 function includes the stacking of registers, the parameters of functions, local variables, and the parameters of calling functions.
  • the register is used to record some important information of the currently running function. For example, when entering a new function and starting execution, the register holds the information of the previous function.
  • the registers include, for example, program counter (PC) registers, link (LR) registers, stack pointer (SP) registers, stack frame pointer (FP) registers, and so on. Among them, FP points to the bottom of the stack of the current function stack frame, and SP points to the top of the stack of the current function stack frame.
  • the system saves the stack on-site: the first step is to push the relevant registers onto the stack, then the function parameters are pushed onto the stack, the function's local variables are pushed onto the stack, and finally the parameters of the called function are pushed Stack.
  • pushing the stack refers to putting data from the top of the stack into the stack, and when the data is out of the stack, it is taken from the top of the stack.
  • FIG. 2 shows a schematic diagram of a framework 200 for heterogeneous debugging provided by an embodiment of the present application.
  • the heterogeneous debugging framework mainly includes a debugging module 201, a debugging agent module 202, and a stack buffer 203.
  • Debugging module 201 a module that the developer directly interacts with, provides basic debugging capabilities, is responsible for communicating with the debugging agent module 202, and issuing debugging commands.
  • Debugging agent module 202 shield the differences of the underlying hardware, provide a unified debugging interface, and implement information collection and device control of the underlying hardware.
  • the underlying hardware may include, for example, a central processing unit (CPU) 204, a graphics processing unit (GPU) 205, and a multi-core processor 206.
  • the CPU 204 may be a host-side processor
  • the GPU 205 and the multi-core processor 206 may be device-side processors.
  • the device is, for example, an application specific integrated circuit (ASIC). ), This embodiment of the present application does not limit this.
  • ASIC application specific integrated circuit
  • Stack buffer 203 a specific memory is allocated in double-rate SDRAM (double data SDRAM, DDR). In the embodiment of the present application, it carries the storage stack information of the heterogeneous processor.
  • the stack buffer 203 may include stack frame 0, stack frame 1, and stack frame 2.
  • the stack buffer 203 is located in the DDR for example. It can be understood that the stack buffer 203 may also be located in other storage structures or memories, such as high bandwidth memory (high bandwidth memory, HBM), hybrid memory cube (HMC), which is not limited in the embodiments of the present application. In addition, in the embodiment of the present application, the stack buffer may also have other names, such as a global stack buffer, which is not limited in the embodiment of the present application.
  • HBM high bandwidth memory
  • HMC hybrid memory cube
  • the stack buffer may also have other names, such as a global stack buffer, which is not limited in the embodiment of the present application.
  • FIG. 3 shows a schematic flowchart of a method for debugging a program provided by an embodiment of the present application.
  • the method may be executed by the heterogeneous debugging framework 200 in FIG. 2, which is not limited in the embodiment of the present application.
  • the first processing module executes a first function of the debugged program, where the stack of the first function includes a first identifier, and the first identifier is used to identify the thread described by the first function.
  • the thread to which the first function belongs may be the first thread.
  • the first processing module is a host-side device or a device-side device. Specifically, it may be the above CPU, GPU, many-core processor, or ASIC, which is not limited in the embodiments of the present application.
  • the process to be debugged may include at least one thread, for example, including a first thread, and optionally, the process to be debugged may also include a second thread, which is not limited in this embodiment of the present application.
  • the first identifier may be the thread identifier (thread_ID) of the thread described in the first function.
  • thread_ID the thread identifier
  • the stack structure of the original function for example, the main function and the function called by the main function
  • a thread identification field of the thread to which the function belongs can be newly added.
  • thread_ID may be written into the stack structure as a field to serve as a unique identifier of the stack chain structure.
  • the thread_ID field can be passed down from the host side to characterize the thread to which the stack frame of the currently running function belongs.
  • FIG. 4 shows a schematic diagram of a stack structure provided by an embodiment of the present application.
  • a thread ID (thread_ID) field is newly added in the main function stack frame, and the thread ID representing the main function stack frame belongs to thread_ID.
  • a thread_ID field has also been newly added to the Func1 function stack frame, and the thread ID representing the stack frame of the Func1 function is also thread_ID.
  • the second processing module executes the second function.
  • the stack frame of the second function includes the first identifier, and at this time, the first identifier is used to identify the thread described by the second function, that is, the first identifier is used to identify the second function and the first The functions belong to the same thread.
  • the first identification may be transmitted from the first processing module to the second processing module.
  • the first processing module is a CPU
  • the first function is a main function
  • the second processing module is a GPU or a multi-core processor
  • the second function is a function called by the main function.
  • the first processing module may be a GPU or a multi-core processor
  • the second processing module is a GPU or a multi-core processor different from the first processing module
  • the second function is a function called by the first function .
  • the debugging agent module obtains scheduling information, where the scheduling information is used to instruct the first function to call the second function.
  • the scheduling framework of the debugged program will send the task scheduling to the debugging agent module 202, and the task scheduling related information may be included in the above scheduling information.
  • the debugging agent module sends a first notification message to the debugging module, where the first notification message is used to notify the debugging module that the first function calls the second function. Specifically, after receiving the scheduling information sent by the scheduling framework in 330, the debugging agent module 202 sends a signal to the debugging module 201 to notify the inter-core scheduling.
  • the debugging agent module and the debugging module may be set on a processor, such as a CPU or a GPU.
  • the processor may be the processor where the first processing module is located, or the processor where the second processing module is located, or the debugging agent module and the debugging module may be set on separate processors, which is not done in the embodiments of the present application limited.
  • the debugging module saves the current first stack frame of the first function to a stack buffer.
  • the debugging module 201 then saves the stack frame of the current device to the stack buffer 203 according to the original format of the stack frame.
  • the debugging module 201 when there is a stack list corresponding to the thread (for example, the first thread) to which the first function belongs in the stack buffer 203, the debugging module 201 adds the first stack frame to the first thread's Stack linked list.
  • the debugging module 201 when there is no stack list of the first thread in the stack buffer 203, the debugging module 201 establishes a stack linked list of the first thread in the stack buffer, and the first thread The initial node of the stack list is the first stack frame.
  • a new stack linked list is created, and the current stack frame is added to the linked list as an initial node. If there is already a stack linked list corresponding to the thread_id of the stack frame to be saved in the current stack buffer 203, the stack frame of the current processor is added to the stack linked list corresponding to the thread_id.
  • the form of the stack buffer 203 may be as shown in FIG. 5.
  • the stack buffer 203 includes stack lists of three threads, and the identifiers of the three stack lists are thread 0 (thread_0), thread 1 (thread_1), and thread 2 (thread_2).
  • the stack list identified as thread_0 includes the stack frame of the host, the stack frame of ASIC0, the stack frame of ASIC4, and the stack frame of ASIC8, and the stack linked list of thread_1 includes the stack frame of the host, the stack frame of ASIC1, and the The stack frame, the stack frame of ASIC9, and the stack list identified as thread_2 include the stack frame of the host, the stack frame of ASIC3, the stack frame of ASIC7, the stack frame of ASIC2, and the stack frame of ASIC5, respectively.
  • the stack frame is saved in the stack buffer, so that the stack frame of the first function saved in the stack buffer is associated with the thread to which the first function belongs, to realize the recording of the stack information of the same thread, based on this embodiment of the present application It can record the end-to-end flow of business during debugging, and improve the efficiency of developers in locating problems.
  • the current second stack frame of the second function is obtained, and then, the second stack frame is parsed to obtain the second stack frame included in the second stack frame
  • the first identification, and then a stack frame including the first identification can be obtained from the stack buffer.
  • the user's command to display stack information can be obtained.
  • the stack frame in is displayed to obtain the full stack information from the beginning of the program to the current scene.
  • the stack frame associated with the current thread in the stack buffer can be recorded, so that an intuitive business flowchart can be provided to enable the developer to debug In the process, the end-to-end flow of business can be obtained, and the efficiency of developers in locating problems can be improved.
  • FIG. 6 shows a schematic diagram of a multi-threaded service flow in the prior art.
  • FIG. 6 shows a total of 3 service flow information, for example, shown in dotted lines from host-> ASIC 0-> ASIC 4-> ASIC 8.
  • Business flow information from the host-> ASIC 1-> ASIC 6-> ASIC 9 shown by the solid line, from the host-> ASIC 3-> ASIC 7-> ASIC 2-> ASIC shown by the dotted line 5 business flow information.
  • FIG. 7 shows a schematic diagram of a multi-thread service flow provided by an embodiment of the present application.
  • the stack associated with the current thread in the stack buffer may be displayed.
  • the stack information on ASIC 8 can be obtained, and the service flow information from host-> ASIC 0-> ASIC 4-> ASIC 8 can also be obtained.
  • it can also display the service flow information from host-> ASIC 1-> ASIC 6-> ASIC 9 and the flow information from host-> ASIC 3-> ASIC 7-> ASIC 2-> ASIC 5. . Therefore, the embodiments of the present application can provide an intuitive business flowchart, so that the developer can obtain the end-to-end flow of the business during the debugging process, and improve the efficiency of the developer in locating the problem.
  • FIG. 8 shows an example of a specific debugging program provided by the embodiment of the present application. It should be understood that FIG. 8 shows steps or operations of the method for debugging a program, but these steps or operations are merely examples, and other operations or variations of the operations in FIG. 8 may be performed in the embodiments of the present application. In addition, each step in FIG. 8 may be performed in a different order from that presented in FIG. 8, and it may not be necessary to perform all operations in FIG. 8.
  • the code of the debugged program is mainly divided into the following three parts:
  • the main function is the code on the host side, compiled and run on the host side;
  • the asic0_fun function is the code on the device side, compiled and run on the ASIC 0 side;
  • the asic1_fun function is on the device side, and compiles and runs on the ASIC 1 side.
  • the debugged program is as follows:
  • the debugging module establishes a debugging agent process to determine the process being debugged.
  • the debug module sets a breakpoint in the code on the ASIC1 side. Then, the debugging module can send a startup debugging command to the CPU through the debugging agent module.
  • the CPU executes the main function.
  • the program on the host side is executed first. Assume that the thread_id running at this time is 0. At this time, the stack frame of the main function includes the identifier thread_0.
  • the main function calls the asic0_func function on ASIC0.
  • the main function will call the asic0_fun function, which will execute the business on the ASIC0 side.
  • the scheduling framework of the debugged program will send the task scheduling to the debugging agent module.
  • the debugging agent module notifies the debugging module of inter-core scheduling.
  • the debugging agent module After receiving this information, the debugging agent module sends a signal to the debugging module to notify the debugging module that the main function schedules the asic0_func function.
  • the debugging module saves the current CPU stack frame.
  • the debugging module After receiving the signal sent by the debugging agent module, the debugging module confirms that it is a task scheduling signal. At this time, the stack information of the current device is saved. Specifically, at this time, the debugging module saves the current stack frame of the main function in the CPU.
  • stack frame_0 stack frame 0
  • the information in the stack frame is specifically int main () at xxx.cce: 10.
  • ASIC 0 executes asic0_func function, and the service continues to execute on ASIC 0 at this time.
  • the asic0_func function calls the asic1_func function on ASIC1.
  • the debugging agent module notifies the debugging module of inter-core scheduling.
  • the debugging agent module needs to notify the debugging module through a signal to notify the asic0_func function to schedule the asic1_func function.
  • the debugging module saves the current stack frame of ASIC 0.
  • the debugging module adds the stack frame of the current ASIC 0 to the stack linked list.
  • the stack frame of the current ASIC 0 is added to the stack with thread_id 0, that is, the linked list where stack_main is located.
  • the newly added stack frame may be represented as stack frame 1 (stack frame_1), and the information in the stack frame is specifically void asic0_fun () at xxx.cce: 6.
  • the embodiment of the present application by adding an identification field for identifying the thread to which the function belongs in the stack frame of the function, when the first function running on the first processing module calls the second function on the second device, the The current stack frame of the first function is saved in the stack buffer, so that the stack frame of the first function saved in the stack buffer is associated with the thread to which the first function belongs, and the recording of the stack information of the same thread is realized based on
  • This embodiment of the present application can record the end-to-end flow of services during debugging, and improve the efficiency of developers in locating problems.
  • FIG. 9 shows a schematic flowchart of a method for displaying full stack information provided by an embodiment of the present application. It should be understood that FIG. 9 shows steps or operations of the method for displaying full stack information, but these steps or operations are merely examples, and other operations or variations of the operations in FIG. 9 may be performed in the embodiments of the present application. In addition, each step in FIG. 9 may be performed in a different order from that presented in FIG. 9, and it may not be necessary to perform all operations in FIG. 9.
  • ASIC 1 can report the breakpoint event to the debug module.
  • 903 the debug module calls the backtrace command.
  • 903 may include 9031, 9032, and 9033.
  • the user can return to the debugging interface.
  • the stack frame on ASIC 1 is void asic1_fun () at xxx.cce: 2.
  • execute 9032 to obtain the thread_id identifier on the stack frame on the ASIC 1.
  • the thread_id can be obtained as thread_0.
  • go to the stack buffer to find the thread linked list of thread_0.
  • the stack linked list of thread_0 can be found in the stack buffer.
  • the stack linked list includes stack_frame_0 and stack_frame_1.
  • the stack frame associated with the current thread in the stack buffer can be recorded, so that an intuitive business flowchart can be provided to enable the developer to debug In the process, the end-to-end flow of business can be obtained, and the efficiency of developers in locating problems can be improved.
  • the host side is a multi-threaded program
  • the two threads are, for example, thread 0 (thread_0) and thread 1 (thread_1).
  • a breakpoint was hit on ASIC2.
  • the business on thread4's asic4 takes a long time. When the ASIC 2 hits the interruption point, asic4 will still perform business processing.
  • thread_0 When thread_0 is scheduled to ASIC 0, according to the above description, the stack frame on the host side of thread_0 will be added to the global stack buffer, and a new stack list with thread_0 as the key will be created.
  • thread_1 When thread_1 is scheduled to ASIC 3, the stack frame on the host side of thread_1 will be added to the global stack buffer, and a new stack list with thread_1 as the key will be created.
  • the stack frame of ASIC 0 needs to be put on the stack. At this time, since the stack link list corresponding to thread_0 already exists in the stack buffer, the stack frame of ASIC 0 is added to the stack link list corresponding to thread_0.
  • the stack frame of ASIC 3 needs to be put on the stack. At this time, since the stack list corresponding to thread_1 already exists in the stack buffer, the stack frame of ASIC 3 is added to the stack list corresponding to thread_1.
  • thread_0 When thread_0 is scheduled to ASIC 2, add the stack of ASIC 1 to the stack list corresponding to thread_0.
  • the current stack buffer is shown in Figure 10, where thread_0 contains the stack frame of host / ASIC 0 / ASIC 1 and retains their calling sequence, and thread_1 contains the stack frame of host / ASIC 3.
  • the present application by adding an identification field for identifying the thread to which the function belongs in the stack frame of the function, when the first function on the first processing module calls the second function, the current stack of the first function
  • the frame is saved in the stack buffer, so that the stack frame of the first function saved in the stack buffer is associated with the thread to which the first function belongs, and the recording of the stack information of the same thread can be realized. Record the end-to-end flow of business during debugging to improve the efficiency of developers in locating problems.
  • the stack frame on the ASIC 2 device is first obtained, and the thread_id field in the stack frame of ASIC 2 is parsed.
  • the thread_id is thread_0
  • the stack of thread_0 is searched from the stack buffer. Since there is a stack linked list of thread_0 in the stack buffer, the stack linked list will be parsed in turn to obtain all stack information.
  • the stack The information is as follows:
  • the user performs a core switching operation at this time, such as switching to ASIC 4, when viewing the current stack frame through the backtrace command, first obtain the stack information on ASIC 4, and analyze that the thread_id field in the stack frame of ASIC 4 is thread_1. Find the stack whose thread_id is thread_1 from the stack buffer. Because the stack buffer has a stack linked list of thread_1, it will parse the stack linked list in turn to obtain the full stack information.
  • the full stack information is as follows:
  • the stack frame associated with the current thread in the stack buffer can be recorded, so that an intuitive business flowchart can be provided to enable the developer to debug In the process, the end-to-end flow of business can be obtained, and the efficiency of developers in locating problems can be improved.
  • FIG. 11 shows a schematic diagram of a system 1100 for debugging a program provided by an embodiment of the present application.
  • the system 1100 includes a first processing module 1110, a second processing module 1120, a debugging agent module 1130, a debugging module 1140, and a stack buffer 1150.
  • the first processing module 1100 is configured to execute a first function of the debugged program, wherein the stack frame of the first function includes a first identifier, and the first identifier is used to identify a thread to which the first function belongs.
  • the second processing module 1120 is configured to execute the second function, and the stack frame of the second function includes the first identifier, The first identifier is used to identify a thread to which the second function belongs, wherein the second function and the first function belong to the same thread.
  • the debugging agent module 1130 is used to obtain scheduling information, and the scheduling information is used to instruct the first function to call the second function;
  • the debugging agent module 1130 is further configured to send a notification message to the debugging module 1140, and the notification message is used to notify the first function to call the second function;
  • the debugging module 1140 is used to save the current first stack frame of the first function to a stack buffer 1150.
  • the stack frame is saved in the stack buffer, so that the stack frame of the first function saved in the stack buffer is associated with the thread to which the first function belongs, to realize the recording of the stack information of the same thread, based on this embodiment of the present application It can record the end-to-end flow of business during debugging, and improve the efficiency of developers in locating problems.
  • system 1100 for debugging a program further includes an acquisition module for:
  • the stack frame associated with the current thread in the stack buffer can be recorded, so that an intuitive business flowchart can be provided to enable the developer to debug In the process, the end-to-end flow of business can be obtained, and the efficiency of developers in locating problems can be improved.
  • the debugging module 1140 is specifically used to:
  • the debugging module 1140 is specifically used to:
  • a stack linked list corresponding to the thread to which the first function belongs is established in the stack buffer, and the initial node of the stack linked list is the first stack frame.
  • the first processing module 1110 is a host-side processor
  • the second processing module 1120 is a device-side processor.
  • the system 1100 for debugging a program shown in FIG. 11 can implement various processes corresponding to the foregoing method embodiments. Specifically, for each module in the system 1100 for the debugging program, reference may be made to the description above.
  • An embodiment of the present application also provides a computer-readable medium for storing a computer program, where the computer program includes instructions for executing the method for debugging the program.
  • An embodiment of the present application also provides a computer program product, the computer program product comprising: computer program code, which is enabled when the computer program code is run by a processing module or processor in the system of the debug program The system of the program executes the above method of debugging the program.
  • the processor mentioned in the embodiment of the present invention may be a CPU, and may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), and off-the-shelf.
  • DSP Digital Signal Processors
  • ASIC Application Specific Integrated Circuit
  • Programmable gate array Field Programmable Gate Array, FPGA
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory or storage module mentioned in the embodiments of the present invention may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronic Erasable programmable read only memory (Electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be a random access memory (Random Access Memory, RAM), which is used as an external cache.
  • RAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM DDR SDRAM
  • enhanced SDRAM ESDRAM
  • Synchlink DRAM SLDRAM
  • Direct Rambus RAM Direct Rambus RAM
  • the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component
  • the memory storage module
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the units is only a division of logical functions.
  • there may be other divisions for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the present application essentially or part of the contribution to the existing technology or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to enable a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请提供了一种调试程序的方法和系统。本申请实施例通过在函数的栈帧中增加用于标识该函数所属的线程的标识字段,当第一处理模块上运行的第一函数调用第二函数时,将该第一函数当前的栈帧保存到堆栈缓冲区中,使得堆栈缓冲区中保存的第一函数的栈帧与该第一函数所属的线程相关联,实现对同一个线程的堆栈信息的记录,基于此本申请实施例能够在调试过程中记录业务的端到端的流向,提高开发者定位问题的效率。

Description

调试程序的方法和系统 技术领域
本申请涉及计算机领域,并且更加具体的,涉及计算机领域中的调试程序的方法和系统。
背景技术
当前,异构计算正在成为并行计算的一种新形式。根据不同的业务场景,越来越多定制化的处理器已经问世。异构体系结构的产生,必然导致异构软件的调试需求。统一计算架构(compute unified device architecture,CUDA)-GDB(GNU debuger)是用于在Linux和Mac上运行CUDA应用程序的NVIDA的工具。CUDA-GDB是基于GNU开源组织发布的x86-64版本的GDB项目所移植的调试器,该工具为开发人员提供了一种调试在实际硬件上运行的CUDA应用程序的机制。
CUDA程序包含主机(host)和设备(device)侧代码,这两部分代码会运行在不同的设备上。作为举例,host侧代码可以是编译运行在X86侧的代码,device侧的代码可以是编译运行在图形处理器(graphic processing unit,GPU)上的代码。在进行调试的时候,当用户所设置的断点为host侧的断点时,程序在运行过程中命中host侧断点,当前现场包含host侧的堆栈信息。当用户所设置的断点为device侧的断点时,程序在运行过程中命中device侧断点时,当前现场只会包含device侧的堆栈信息。
因此,当前异构调试的工具,没有具备显示全栈的功能。特别在异构任务联动中,开发者无法感知业务整体数据流,造成调试人力增高。
发明内容
本申请提供一种调试程序的方法和系统,能够在调试过程中记录业务的端到端的流向,提高开发者定位问题的效率。
第一方面,提供了一种调试程序的方法,包括:
第一处理模块执行被调试程序的第一函数,其中,所述第一函数的栈帧包括第一标识,所述第一标识用于标识所述第一函数所属的线程;
在所述第一函数调用第二函数时,第二处理模块执行所述第二函数,所述第二函数的栈帧包括所述第一标识,所述第一标识用于标识所述第二函数所属的线程,其中,所述第二函数与所述第一函数属于相同的线程;
调试代理模块获取调度信息,所述调度信息用于指示所述第一函数调用所述第二函数;
所述调试代理模块向调试模块发送通知消息,所述通知消息用于通知所述第一函数调用所述第二函数;
所述调试模块将所述第一函数当前的第一栈帧保存到堆栈缓冲区。
因此,本申请实施例通过在函数的栈帧中增加用于标识该函数所属的线程的标识字段,当第一处理模块上运行的第一函数调用第二函数时,将该第一函数当前的栈帧保存到堆栈缓冲区中,使得堆栈缓冲区中保存的第一函数的栈帧与该第一函数所属的线程相关联,实现对同一个线程的堆栈信息的记录,基于此本申请实施例能够在调试过程中记录业务的端到端的流向,提高开发者定位问题的效率。
可选的,本申请实施例中,第一标识为第一函数和第二函数所属的线程的线程标识(thread_ID)。
结合第一方面,在第一方面的某些实现方式中,还包括:
在命中所述第二函数上的断点之后,获取所述第二函数当前的第二栈帧;
获取所述第二栈帧中包括的所述第一标识,并获取所述堆栈缓冲区中的包括所述第一标识的栈帧。
因此,本申请实施例中,在调试程序过程中当命中断点时,可以将堆栈缓冲区中与当前线程相关联的栈帧进行记录,从而能够提供直观的业务流程图,使得开发者在调试过程中能够获取业务的端到端的流向,提高开发者定位问题的效率。
结合第一方面,在第一方面的某些实现方式中,所述调试模块将所述第一函数当前的第一栈帧保存到堆栈缓冲区,包括:
所述调试模块将所述第一栈帧添加到所述堆栈缓冲区中的所述第一函数所述的线程对应的堆栈链表中。
结合第一方面,在第一方面的某些实现方式中,所述调试模块将所述第一函数当前的第一栈帧保存到堆栈缓冲区,包括:所述调试模块在所述堆栈缓冲区中建立所述第一函数所述的线程对应的堆栈链表,所述堆栈链表的初始节点为所述第一栈帧。
这样,在堆栈缓冲区中不存在该第一函数所属的线程对应的堆栈链表时,可以建立该堆栈链表。
结合第一方面,在第一方面的某些实现方式中,第一处理模块为主机侧处理器,所述第二处理模块为设备侧处理器。
本申请实施例中,调试代理模块和调试模块可以设置于处理器上,例如CPU或者GPU。具体的,该处理器可以为第一处理模块所在的处理器,或第二处理模块所在的处理器,或者调试代理模块和调试模块可以设置于单独的处理器上,本申请实施例对此不作限定。
第二方面,提供了一种调试程序的系统,用于执行上述第一方面或第一方面的任意可能的实现方式中的方法。具体地,该调试程序的系统包括用于执行上述第一方面或第一方面的任意可能的实现方式中的方法的模块。
第三方面,提供了一种计算机可读介质,用于存储计算机程序,该计算机程序包括用于执行上述第一方面的任意可能的实现方式中的方法的指令。
第四方面,提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序代码,当所述计算机程序代码被所述调试程序的系统中的处理模块或处理器运行时,使得该调试程序的系统执行上述第一方面的任意可能的实现方式中的方法。
附图说明
图1示出了一种堆栈结构的示意图。
图2示出了本申请实施例提供的一种异构调试的框架的示意图。
图3示出了本申请实施例提供的一种调试程序的方法的示意性流程图。
图4示出了本申请实施例提供的一种堆栈结构的示意图。
图5示出了本申请实施例提供的一种堆栈缓冲区的示意图。
图6示出了现有技术中的一种多线程业务流的示意图。
图7示出了本申请实施例提供的一种多线程业务流的示意图。
图8示出了本申请实施例提供的一个具体的调试程序的例子。
图9示出了本申请实施例提供的一种显示全栈信息的方法的示意性流程图。
图10示出了本申请实施例提供的一个具体的调试程序的例子。
图11示出了本申请实施例提供的一种调试程序的系统的示意性框图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
图1示出了一种堆栈结构的示意图。如图1所示,堆栈的上面部分的为主(main)函数的栈帧(stack frame),下面部分为Func(function)1函数的栈帧,即当前函数(被调用者)的栈帧,栈底在高地址,栈向下增长。栈帧是指为一个函数调用单独分配的那部分栈空间。比如,当运行中的程序调用另一个函数时,就要进入一个新的栈帧,原来函数的栈帧称为调用者的帧,新的栈帧称为当前帧。被调用的函数运行结束后当前帧全部收缩,回到调用者的帧。
具体的,主函数栈帧包括寄存器入栈,函数入参,局部变量,调用函数的参数。Func1函数的栈帧包括寄存器入栈,函数入参,局部变量,调用函数的参数。这里,寄存器用于记录当前正在运行的函数的一些重要信息,比如在刚进入一个新的函数开始执行的时候,寄存器保存的是上个函数的信息。寄存器例如包括程序计数(program counter,PC)寄存器,连接(link register,LR)寄存器,堆栈指针(stack pointer,SP)寄存器、栈帧指针(frame pointer,FP)寄存器等。其中,FP指向当前函数栈帧的栈底,SP则指向当前函数栈帧的栈顶。
当发生函数调用时,系统进行堆栈现场保存工作:第一步先将相关寄存器压栈,然后将函数的入参进行压栈,函数的局部变量进行压栈,最后将被调用的函数的参数压栈。这里,压栈指的是把数据从栈顶放入栈中,数据出栈的时候从栈顶取出。
图2示出了本申请实施例提供的一种异构调试的框架200的示意图。如图2所示,该异构调试的框架主要包含调试模块201,调试代理模块202,堆栈缓冲区203。
调试模块201:开发者直接交互的模块,提供基础的调试能力,负责与调试代理模块202进行通信,下发调试命令。
调试代理模块202:屏蔽底层硬件差异,提供统一的调试接口,实现对底层硬件的信息收集和设备的控制。这里,底层硬件例如可以包括中央处理器(central processing unit,CPU)204,图形处理器(graphic processing unit,GPU)205和众核处理器206。
本申请实施例中,CPU 204可以为主机(host)侧处理器,GPU 205和众核处理器206可以为设备(device)侧处理器,这里device例如为专用集成电路(application specific  integrated circuit,ASIC),本申请实施例对此不作限定。
堆栈缓冲区203:在双倍速率SDRAM(double data rate SDRAM,DDR)中分配一块特定内存,在本申请实施例中,承载存储异构处理器全局的堆栈信息。作为示例,该堆栈缓冲区203中可以包括栈帧(stack frame)0、栈帧1、栈帧2。
需要说明的是,本申请实施例中,仅以堆栈缓冲区203位于DDR中进行举例说明,可以理解,堆栈缓冲区203也可以位于其他存储结构或存储器中,例如高带宽内存(high bandwidth memory,HBM),混合记忆体立方体(hybrid memory cube,HMC),本申请实施例对此不作限定。另外,本申请实施例中,该堆栈缓冲区还可以具有其他命名,例如全局堆栈缓冲区,本申请实施例对此不作限定。
图3示出了本申请实施例提供的一种调试程序的方法的示意性流程图。作为示例,该方法可以由图2中的异构调试的框架200执行,本申请实施例对此不作限定。
310,第一处理模块执行被调试程序的第一函数,其中,所述第一函数的堆栈包括第一标识,所述第一标识用于标识所述第一函数所述的线程。作为举例,第一函数所属的线程可以为第一线程。
这里,第一处理模块为host侧设备或者device侧的设备。具体的,可以为上文中的CPU、GPU、众核处理器或者ASIC,本申请实施例对此不作限定。
具体的,被调试进程可以包括至少一个线程,例如包括第一线程,可选的,被调试进程还可以包括第二线程,本申请实施例对此不作限定。
可选的,第一标识可以为第一函数所述的线程的线程标识(thread_ID)。本申请实施例中,在原先的函数(例如主函数和被主函数调用的函数)的堆栈结构,例如图1所示的堆栈结构中,可以新增加该函数所属的线程的线程标识字段。具体而言,本申请实施例在每次建立栈帧的时候,可以将thread_ID作为一个字段,写入到堆栈结构中,作为堆栈链式结构的唯一标识符。
可选的,该thread_ID字段可以从host侧传下来,用来表征当前运行的函数的栈帧所属的线程。
图4示出了本申请实施例提供的一种堆栈结构的示意图。如图4所示,在主函数栈帧中新增加了线程ID(thread_ID)字段,表征主函数的栈帧所属的线程ID为thread_ID。在Func1函数栈帧中也新增加了thread_ID字段,表征Func1函数的栈帧所属的线程ID也为thread_ID。
320,在所述第一函数调用第二函数时,第二处理模块执行所述第二函数。
其中,所述第二函数的栈帧包括所述第一标识,此时所述第一标识用于标识所述第二函数所述的线程,即第一标识用于标识第二函数与第一函数属于相同的线程。一种实现方式中,第一标识可以由第一处理模块传给第二处理模块。
一种可能的实现方式中,第一处理模块为CPU,第一函数为主函数,第二处理模块为GPU或多核处理器,第二函数为被主函数调用的函数。
另一种可能的实现方式中,第一处理模块可以为GPU或多核处理器,第二处理模块为不同于第一处理模块的GPU或多核处理器,第二函数为被第一函数调用的函数。
330,调试代理模块获取调度信息,所述调度信息用于指示所述第一函数调用所述第二函数。
具体的,当发生核间调度的时候,被调试程序的调度框架会将任务调度发送给调试代理模块202,该任务调度相关的信息可以包含在上述调度信息中。
340,调试代理模块向调试模块发送第一通知消息,所述第一通知消息用于向所述调试模块通知所述第一函数调用所述第二函数。具体的,调试代理模块202收到330中调度框架发送的调度信息后,向调试模块201发送信号,以通知该核间调度。
本申请实施例中,调试代理模块和调试模块可以设置于处理器上,例如CPU或者GPU。具体的,该处理器可以为第一处理模块所在的处理器,或第二处理模块所在的处理器,或者调试代理模块和调试模块可以设置于单独的处理器上,本申请实施例对此不作限定。
350,所述调试模块将所述第一函数当前的第一栈帧保存到堆栈缓冲区。
本申请实施例中,然后,调试模块201将当前设备的栈帧按照该栈帧原先的格式,保存到堆栈缓冲区203。
可选的,一种实现方式,当堆栈缓冲区203存在第一函数所属的线程(例如第一线程)对应的堆栈列表时,调试模块201将所述第一栈帧添加到该第一线程的堆栈链表中。
可选的,另一种实现方式,当堆栈缓冲区203不存在第一线程的堆栈列表时,调试模块201在所该堆栈缓冲区中建立该第一线程的堆栈链表,此时该第一线程的堆栈链表的初始节点为第一栈帧。
具体而言,如果当前堆栈缓冲区203不存在需要保存的栈帧的thread_id对应的堆栈链表,则新建一个堆栈链表,将当前栈帧作为一个初始节点加入链表中。如果当前堆栈缓冲区203已经存在需要保存的栈帧的thread_id对应的堆栈链表,则将当前处理器的栈帧加入该thread_id对应的堆栈链表。
当不断发生栈帧插入后,作为示例,堆栈缓冲区203的形式可以如图5所示。具体的,此时堆栈缓冲区203中包括三个线程的堆栈链表,该三个堆栈链表的标识分别为线程0(thread_0),线程1(thread_1)和线程2(thread_2)。其中,标识为thread_0的堆栈链表分别包括主机的栈帧、ASIC0的栈帧、ASIC4的栈帧、ASIC8的栈帧,标识为thread_1的堆栈链表分别包括主机的栈帧、ASIC1的栈帧、ASIC6的栈帧、ASIC9的栈帧,标识为thread_2的堆栈链表分别包括主机的栈帧、ASIC3的栈帧、ASIC7的栈帧、ASIC2的栈帧、ASIC5的栈帧。
因此,本申请实施例通过在函数的栈帧中增加用于标识该函数所属的线程的标识字段,当第一处理模块上运行的第一函数调用第二函数时,将该第一函数当前的栈帧保存到堆栈缓冲区中,使得堆栈缓冲区中保存的第一函数的栈帧与该第一函数所属的线程相关联,实现对同一个线程的堆栈信息的记录,基于此本申请实施例能够在调试过程中记录业务的端到端的流向,提高开发者定位问题的效率。
可选的,本申请实施例中,在命中第二函数上的断点之后,获取第二函数当前的第二栈帧,然后,解析该第二栈帧,获取该第二栈帧中包括的所述第一标识,然后可以从所述堆栈缓冲区中获取包括所述第一标识的栈帧。
具体的,在命中预先设置的断点后,可以获取用户的显示堆栈信息的命令。此时,可以先将当前处理模块(例如第二处理模块)的栈帧显示出来,然后根据当前栈帧的thread_id信息,从堆栈缓冲区中查找对应的thread_id的堆栈链表,然后依次将该堆栈链表中的栈 帧显示出来,从而获取到从程序开始到当前现场的全栈信息。
同理,可以根据步骤310至350的描述对被调试程序的第二线程采用相类似的操作。当用户进行切换核操作,进行第二线程的堆栈显示时,同样地,会根据以上描述的过程,将第二线程的完整的全栈信息显示出来,从而达到全栈显示的目的。
因此,本申请实施例中,在调试程序过程中当命中断点时,可以将堆栈缓冲区中与当前线程相关联的栈帧进行记录,从而能够提供直观的业务流程图,使得开发者在调试过程中能够获取业务的端到端的流向,提高开发者定位问题的效率。
另外,本申请实施例中,可以将原先杂乱的,但是有内部关联的核间堆栈信息整理成树状结构,提供更直观的数据流图。图6示出了现有技术中的一种多线程业务流的示意图,图6中共显示了3个业务流信息,例如虚线所示的从host->ASIC 0->ASIC 4->ASIC 8的业务流信息,实线所示的从host->ASIC 1->ASIC 6->ASIC 9的业务流信息,点画线所示的从host->ASIC 3->ASIC 7->ASIC 2->ASIC 5的业务流信息。由于现有调试技术中,在业务的运行过程中没有关注核间的调用关系,对于用户而言,无法直观地获取到业务在处理器之间的调用关系,只能获取的某个核上的栈帧。比如,当命中ASIC 8上的断点时,当前只能获取到ASIC 8上的栈帧,但是却无法获取到从host->ASIC 0->ASIC 4->ASIC 8的业务流信息。
图7示出了本申请实施例提供的一种多线程业务流的示意图。如图7所示,本申请实施例在调试程序过程中当命中断点时,可以将堆栈缓冲区中与当前线程相关联的堆栈进行显示,例如当命中ASIC 8上的断点时,当前不仅可以获取到ASIC 8上的堆栈信息,还能够获取到从host->ASIC 0->ASIC 4->ASIC 8的业务流信息。当进行核切换时,还可以显示从host->ASIC 1->ASIC 6->ASIC 9的业务流信息,从host->ASIC 3->ASIC 7->ASIC 2->ASIC 5的业务流信息。从而本申请实施例能够提供直观的业务流程图,使得开发者在调试过程中能够获取业务的端到端的流向,提高开发者定位问题的效率。
图8示出了本申请实施例提供的一个具体的调试程序的例子。应理解,图8示出了调试程序的方法的步骤或操作,但这些步骤或操作仅是示例,本申请实施例还可以执行其他操作或者图8中的各个操作的变形。此外,图8中的各个步骤可以按照与图8呈现的不同的顺序来执行,并且有可能并非要执行图8中的全部操作。
本申请实施例中,被调试程序的代码主要分为以下3个部分:
main函数,为host侧的代码,编译运行在host侧;
asic0_fun函数,为device侧的代码,编译运行在ASIC 0侧;
asic1_fun函数,为device侧,编译运行在ASIC 1侧。
作为示例,被调试程序如下所示:
Figure PCTCN2018109518-appb-000001
Figure PCTCN2018109518-appb-000002
801,启动被调试程序。
具体的,调试模块建立调试代理进程,确定被调试进程。作为示例,调试模块在ASIC1侧的代码中设置断点。然后,调试模块可以通过调试代理模块向CPU发送启动调试命令。
802,CPU执行main函数。
具体的,被调试进程在运行过程中,先执行host侧的程序。假定此时运行的thread_id为0。此时,main函数的栈帧中包括标识thread_0。
803,main函数调用ASIC0上的asic0_func函数。
具体的,在执行host侧的程序的过程中,main函数会调用asic0_fun函数,将执行在ASIC0侧的业务。在此时,被调试程序的调度框架会将任务调度发送给调试代理模块。
804,调试代理模块通知调试模块核间调度。
调试代理模块收到该信息后,给调试模块发送信号,用于通知调试模块main函数对asic0_func函数进行调度。
805,调试模块保存CPU当前栈帧。
调试模块收到调试代理模块发送的该信号后,确认是任务调度信号,此时将当前设备的堆栈信息进行保存。具体而言,此时调试模块保存CPU中的main函数当前的栈帧。
此时为第一次在堆栈缓冲区存入栈帧,当前的堆栈缓冲区不存在需要保存的栈帧的thread_id为0对应的堆栈链表,因此需要新建一个堆栈链表,将当前该main函数的栈帧作为一个初始节点加入链表中,作为示例,该栈帧可以标示为栈帧0(stack frame_0),该栈帧中的信息具体为int main()at xxx.cce:10。
806,ASIC 0执行asic0_func函数,此时业务在ASIC 0上继续执行。
807,asic0_func函数调用ASIC1上的asic1_func函数。
808,调试代理模块通知调试模块核间调度。
同理地,调试代理模块需要通过信号通知给调试模块,通知asic0_func函数对asic1_func函数进行调度。
809,调试模块保存ASIC 0当前栈帧。
具体的,调试模块将当前ASIC 0的栈帧加入堆栈链表中。具体的,在将ASIC 1上的asic1_func函数当前的栈帧加入堆栈链表时,会将当前ASIC 0的栈帧加到thread_id为0的堆栈中,即stack_main所在的链表。作为示例,该新加入的栈帧可以表示为栈帧1(stack frame_1),该栈帧中的信息具体为void asic0_fun()at xxx.cce:6。
因此,本申请实施例通过在函数的栈帧中增加用于标识该函数所属的线程的标识字段,当第一处理模块上运行的第一函数调用第二设备上的第二函数时,将该第一函数当前的栈帧保存到堆栈缓冲区中,使得堆栈缓冲区中保存的第一函数的栈帧与该第一函数所属的线程相关联,实现对同一个线程的堆栈信息的记录,基于此本申请实施例能够在调试过程中记录业务的端到端的流向,提高开发者定位问题的效率。
图9示出了本申请实施例提供的一种显示全栈信息的方法的示意性流程图。应理解,图9示出了显示全栈信息的方法的步骤或操作,但这些步骤或操作仅是示例,本申请实施例还可以执行其他操作或者图9中的各个操作的变形。此外,图9中的各个步骤可以按照与图9呈现的不同的顺序来执行,并且有可能并非要执行图9中的全部操作。
图9中与图8中相同的模块或单元具有相同或相似的含义,为了简洁,这里不再描述。
901,命中ASIC 1上的断点。
902,在901之后,ASIC 1可以向调试模块上报断点事件。
903,调试模块调用backtrace命令。具体的,903可以包括9031、9032和9033三部分。
具体而言,在命中断点之后,可以返回用户的调试界面。当用户通过backtrace命令进行堆栈显示时,首先执行9031,将ASIC1上的栈帧显示出来。作为示例,ASIC 1上的栈帧为void asic1_fun()at xxx.cce:2。然后执行9032,获取ASIC 1上的栈帧上的标识thread_id,具体的,本申请实施例中可以得到thread_id为thread_0。然后,去堆栈缓冲区中查找thread_0的堆栈链表。此时,能够在堆栈缓冲区查找到thread_0的堆栈链表,本申请实施例中该堆栈链表包括stack frame_0和stack frame_1。作为示例,此时可以将该链表逆序输出,从而获取从ASIC 1->ASIC 0->host的全栈信息,如下所示:
Figure PCTCN2018109518-appb-000003
因此,本申请实施例中,在调试程序过程中当命中断点时,可以将堆栈缓冲区中与当前线程相关联的栈帧进行记录,从而能够提供直观的业务流程图,使得开发者在调试过程中能够获取业务的端到端的流向,提高开发者定位问题的效率。
本申请另一个实施例中,如果host侧为多线程程序,假设用户创建两个线程分别进行业务处理,该两个线程例如分别为线程0(thread_0)和线程1(thread_1),用户在thread_0的ASIC2上打了断点。同时,thread_1的asic4上的业务耗时较久,在ASIC 2命中断点时,asic4仍会进行业务处理。
当thread_0调度到ASIC 0时,根据上文中的描述,会将thread_0的host侧的栈帧加入到全局堆栈缓冲区中,并新建一个以thread_0作为关键字的堆栈链表。
当thread_1调度到ASIC 3时,会将thread_1的host侧的栈帧加如到全局堆栈缓冲区中,并新建一个以thread_1作为关键字的堆栈链表。
当thread_0调度到ASIC 1时,ASIC 0的栈帧需要入栈,此时,由于在堆栈缓冲区中已经存在thread_0对应的堆栈链表,因此将ASIC 0的栈帧加入到thread_0对应的堆栈链表中。
当thread_1调度到ASIC 4时,ASIC 3的栈帧需要入栈,此时,由于在堆栈缓冲区中已经存在thread_1对应的堆栈链表,因此将ASIC 3的栈帧加入到thread_1对应的堆栈链表中。
当thread_0调度到ASIC 2时,将ASIC 1的堆栈加入到thread_0对应的堆栈链表中。当ASIC 2命中断点时,当前的堆栈缓冲区如图10所示,其中thread_0包含host/ASIC 0/ASIC 1的栈帧,并保留它们的调用顺序,thread_1包含host/ASIC 3的栈帧。
因此,本申请实施例通过在函数的栈帧中增加用于标识该函数所属的线程的标识字段,当第一处理模块上的第一函数调用第二函数时,将该第一函数当前的栈帧保存到堆栈缓冲区中,使得堆栈缓冲区中保存的第一函数的栈帧与该第一函数所属的线程相关联,实现对同一个线程的堆栈信息的记录,基于此本申请实施例能够在调试过程中记录业务的端到端的流向,提高开发者定位问题的效率。
本申请实施例中,当命中ASIC 2上的断点时,用户通过backtrace命令查看当前栈帧时,先获取ASIC 2设备上的栈帧,解析ASIC 2的栈帧中的thread_id字段时,本申请实施例中可以得到该thread_id为thread_0,然后从堆栈缓冲区中查找thread_0的堆栈,由于堆栈缓冲区中存在thread_0的堆栈链表,因此会依次将该堆栈链表解析,进而获取全部的堆栈信息,该堆栈信息如下所示:
Figure PCTCN2018109518-appb-000004
如果用户这个时候进行了切换核的操作,比如切换到ASIC 4上,通过backtrace命令查看当前栈帧时,先获取ASIC 4上的堆栈信息,解析ASIC 4的栈帧中的thread_id字段是thread_1,会从堆栈缓冲区中查找thread_id为thread_1的堆栈,由于堆栈缓冲区存在thread_1的堆栈链表,因此会依次将该堆栈链表解析,进而获取到全栈信息,该全栈信息如下:
Figure PCTCN2018109518-appb-000005
因此,本申请实施例中,在调试程序过程中当命中断点时,可以将堆栈缓冲区中与当前线程相关联的栈帧进行记录,从而能够提供直观的业务流程图,使得开发者在调试过程中能够获取业务的端到端的流向,提高开发者定位问题的效率。
图11示出了本申请实施例提供的一种调试程序的系统1100的示意图。该系统1100包括第一处理模块1110、第二处理模块1120、调试代理模块1130、调试模块1140和堆栈缓冲区1150。
第一处理模块1100,用于执行被调试程序的第一函数,其中,所述第一函数的栈帧包括第一标识,所述第一标识用于标识所述第一函数所属的线程。
在所述第一处理模块1100上运行的第一函数调用第二函数时,第二处理模块1120, 用于执行所述第二函数,所述第二函数的栈帧包括所述第一标识,所述第一标识用于标识所述第二函数所属的线程,其中,所述第二函数与所述第一函数属于相同的线程。
调试代理模块1130,用于获取调度信息,所述调度信息用于指示所述第一函数调用所述第二函数;
所述调试代理模块1130还用于向调试模块1140发送通知消息,所述通知消息用于通知所述第一函数调用所述第二函数;
所述调试模块1140用于将所述第一函数当前的第一栈帧保存到堆栈缓冲区1150。
因此,本申请实施例通过在函数的栈帧中增加用于标识该函数所属的线程的标识字段,当第一处理模块上运行的第一函数调用第二函数时,将该第一函数当前的栈帧保存到堆栈缓冲区中,使得堆栈缓冲区中保存的第一函数的栈帧与该第一函数所属的线程相关联,实现对同一个线程的堆栈信息的记录,基于此本申请实施例能够在调试过程中记录业务的端到端的流向,提高开发者定位问题的效率。
可选的,该调试程序的系统1100还包括获取模块,用于:
在命中所述第二函数上的断点之后,获取所述第二函数当前的第二栈帧;
获取所述第二栈帧中包括的所述第一标识,并获取所述堆栈缓冲区中的包括所述第一标识的栈帧。
因此,本申请实施例中,在调试程序过程中当命中断点时,可以将堆栈缓冲区中与当前线程相关联的栈帧进行记录,从而能够提供直观的业务流程图,使得开发者在调试过程中能够获取业务的端到端的流向,提高开发者定位问题的效率。
可选的,所述调试模块1140具体用于:
将所述第一栈帧添加到所述堆栈缓冲区中的所述第一函数所属的线程对应的堆栈链表中。
可选的,所述调试模块1140具体用于:
在所述堆栈缓冲区中建立所述第一函数所属的线程对应的堆栈链表,所述堆栈链表的初始节点为所述第一栈帧。
可选的,第一处理模块1110为主机侧处理器,所述第二处理模块1120为设备侧处理器。
图11所示的调试程序的系统1100能够实现前述方法实施例对应的各个过程,具体的,该调试程序的系统1100中的各个模块可以参见上文中的描述,为避免重复,这里不再赘述。
本申请实施例还提供了一种计算机可读介质,用于存储计算机程序,该计算机程序包括用于执行上述调试程序的方法的指令。
本申请实施例还提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序代码,当所述计算机程序代码被所述调试程序的系统中的处理模块或处理器运行时,使得该调试程序的系统执行上述调试程序的方法。
应理解,本发明实施例中提及的处理器可以是CPU,还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处 理器或者该处理器也可以是任何常规的处理器等。
还应理解,本发明实施例中提及的存储器或存储模块可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
需要说明的是,当处理器为通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件时,存储器(存储模块)集成在处理器中。
应理解,本申请实施例中出现的第一、第二等描述,仅作示意与区分描述对象之用,没有次序之分,也不表示本申请实施例中对设备个数的特别限定,不能构成对本申请实施例的任何限制。
还应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (10)

  1. 一种调试程序的方法,其特征在于,包括:
    第一处理模块执行被调试程序的第一函数,其中,所述第一函数的栈帧包括第一标识,所述第一标识用于标识所述第一函数所属的线程;
    在所述第一函数调用第二函数时,第二处理模块执行所述第二函数,所述第二函数的栈帧包括所述第一标识,所述第一标识用于标识所述第二函数所属的线程,其中,所述第二函数与所述第一函数属于相同的线程;
    调试代理模块获取调度信息,所述调度信息用于指示所述第一函数调用所述第二函数;
    所述调试代理模块向调试模块发送通知消息,所述通知消息用于通知所述第一函数调用所述第二函数;
    所述调试模块将所述第一函数当前的第一栈帧保存到堆栈缓冲区。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    在命中所述第二函数上的断点之后,获取所述第二函数当前的第二栈帧;
    获取所述第二栈帧中包括的所述第一标识,并获取所述堆栈缓冲区中的包括所述第一标识的栈帧。
  3. 根据权利要求1或2所述的方法,其特征在于,所述调试模块将所述第一函数当前的第一栈帧保存到堆栈缓冲区,包括:
    所述调试模块将所述第一栈帧添加到所述堆栈缓冲区中的所述第一函数所属的线程对应的堆栈链表中。
  4. 根据权利要求1或2所述的方法,其特征在于,所述调试模块将所述第一函数当前的第一栈帧保存到堆栈缓冲区,包括:
    所述调试模块在所述堆栈缓冲区中建立所述第一函数所属的线程对应的堆栈链表,所述堆栈链表的初始节点为所述第一栈帧。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,第一处理模块为主机侧处理器,所述第二处理模块为设备侧处理器。
  6. 一种调试程序的系统,其特征在于,包括:
    第一处理模块,用于执行被调试程序的第一函数,其中,所述第一函数的栈帧包括第一标识,所述第一标识用于标识所述第一函数所属的线程;
    在所述第一处理模块上运行的第一函数调用第二函数时,第二处理模块,用于执行所述第二函数,所述第二函数的栈帧包括所述第一标识,所述第一标识用于标识所述第二函数所属的线程,其中,所述第二函数与所述第一函数属于相同的线程;
    调试代理模块,用于获取调度信息,所述调度信息用于指示所述第一函数调用所述第二函数;
    所述调试代理模块还用于向调试模块发送通知消息,所述通知消息用于通知所述第一函数调用所述第二函数;
    所述调试模块用于将所述第一函数当前的第一栈帧保存到堆栈缓冲区。
  7. 根据权利要求6所述的系统,其特征在于,还包括获取模块,用于:
    在命中所述第二函数上的断点之后,获取所述第二函数当前的第二栈帧;
    获取所述第二栈帧中包括的所述第一标识,并获取所述堆栈缓冲区中的包括所述第一标识的栈帧。
  8. 根据权利要求6或7所述的系统,其特征在于,所述调试模块具体用于:
    将所述第一栈帧添加到所述堆栈缓冲区中的所述第一函数所属的线程对应的堆栈链表中。
  9. 根据权利要求6或7所述的系统,其特征在于,所述调试模块具体用于:
    在所述堆栈缓冲区中建立所述第一函数所属的线程对应的堆栈链表,所述堆栈链表的初始节点为所述第一栈帧。
  10. 根据权利要求6-9任一项所述的系统,其特征在于,第一处理模块为主机侧处理器,所述第二处理模块为设备侧处理器。
PCT/CN2018/109518 2018-10-09 2018-10-09 调试程序的方法和系统 WO2020073200A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/109518 WO2020073200A1 (zh) 2018-10-09 2018-10-09 调试程序的方法和系统
CN201880097908.5A CN112740187A (zh) 2018-10-09 2018-10-09 调试程序的方法和系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/109518 WO2020073200A1 (zh) 2018-10-09 2018-10-09 调试程序的方法和系统

Publications (1)

Publication Number Publication Date
WO2020073200A1 true WO2020073200A1 (zh) 2020-04-16

Family

ID=70163764

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/109518 WO2020073200A1 (zh) 2018-10-09 2018-10-09 调试程序的方法和系统

Country Status (2)

Country Link
CN (1) CN112740187A (zh)
WO (1) WO2020073200A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115705294A (zh) * 2021-08-12 2023-02-17 华为技术有限公司 用于获取函数调用信息的方法、装置、电子设备和介质
CN113672458B (zh) * 2021-08-18 2022-09-09 北京基调网络股份有限公司 一种应用程序的监测方法、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101446918A (zh) * 2008-12-10 2009-06-03 中兴通讯股份有限公司 一种实现用户态调试器调试单个函数的方法及系统
US8880952B1 (en) * 2012-03-14 2014-11-04 Emc Corporation Generic and extensible provider debug interface
CN104216764A (zh) * 2014-07-31 2014-12-17 昆明理工大学 一种基于多线程嵌入式系统并行程序跟踪与回放方法
CN104252402A (zh) * 2014-09-05 2014-12-31 深圳创维数字技术有限公司 一种程序调试方法及装置
CN106802785A (zh) * 2016-12-13 2017-06-06 北京华为数字技术有限公司 一种栈解析方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101446918A (zh) * 2008-12-10 2009-06-03 中兴通讯股份有限公司 一种实现用户态调试器调试单个函数的方法及系统
US8880952B1 (en) * 2012-03-14 2014-11-04 Emc Corporation Generic and extensible provider debug interface
CN104216764A (zh) * 2014-07-31 2014-12-17 昆明理工大学 一种基于多线程嵌入式系统并行程序跟踪与回放方法
CN104252402A (zh) * 2014-09-05 2014-12-31 深圳创维数字技术有限公司 一种程序调试方法及装置
CN106802785A (zh) * 2016-12-13 2017-06-06 北京华为数字技术有限公司 一种栈解析方法和装置

Also Published As

Publication number Publication date
CN112740187A (zh) 2021-04-30

Similar Documents

Publication Publication Date Title
US10013332B2 (en) Monitoring mobile application performance
US8051409B1 (en) Monitoring memory accesses for multiple computer processes
CN111124906B (zh) 基于动态埋点的跟踪方法、编译方法、装置和电子设备
US7987393B2 (en) Determining operating context of an executed instruction
US10545852B2 (en) Diagnostics of state transitions
US20100223446A1 (en) Contextual tracing
US9355003B2 (en) Capturing trace information using annotated trace output
US8418148B2 (en) Thread execution analyzer
US20130275951A1 (en) Race detection for web applications
US10013335B2 (en) Data flow analysis in processor trace logs using compiler-type information method and apparatus
US20120036501A1 (en) Method and System for Capturing System and User Events Using Hardware Trace Devices
US10496423B2 (en) Method for opening up data and functions of terminal application based on reconstruction technology
WO2020073200A1 (zh) 调试程序的方法和系统
CN114880159A (zh) 数据处理方法、装置、设备及存储介质
CN106294132B (zh) 一种管理日志的方法及装置
US10198784B2 (en) Capturing commands in a multi-engine graphics processing unit
US20090327995A1 (en) Annotation-aided code generation in library-based replay
CN115061837B (zh) 一种调度跟踪和获取用户空间调用栈的方法和装置
WO2019071535A1 (zh) 计算机存储介质、程序运行监测方法及装置
CN111881025B (zh) 一种自动化测试任务调度方法、装置及系统
CN108304294B (zh) Ios应用的实时帧数监测方法、存储介质、设备及系统
CN114116509A (zh) 程序分析方法、装置、电子设备和存储介质
EP3721346B1 (en) Bit-accurate-tracing analysis with applied memory region lifetimes
US8539171B2 (en) Analysis and timeline visualization of storage channels
US10061604B2 (en) Program execution recording and playback

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18936474

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18936474

Country of ref document: EP

Kind code of ref document: A1