CN115114117B - Data recording method and data recording device - Google Patents

Data recording method and data recording device Download PDF

Info

Publication number
CN115114117B
CN115114117B CN202210881085.3A CN202210881085A CN115114117B CN 115114117 B CN115114117 B CN 115114117B CN 202210881085 A CN202210881085 A CN 202210881085A CN 115114117 B CN115114117 B CN 115114117B
Authority
CN
China
Prior art keywords
data
tracking
embedded
data recording
trace
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210881085.3A
Other languages
Chinese (zh)
Other versions
CN115114117A (en
Inventor
张锋巍
宁振宇
张一鸣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Original Assignee
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology filed Critical Southwest University of Science and Technology
Priority to CN202210881085.3A priority Critical patent/CN115114117B/en
Publication of CN115114117A publication Critical patent/CN115114117A/en
Application granted granted Critical
Publication of CN115114117B publication Critical patent/CN115114117B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0721Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment within a central processing unit [CPU]
    • G06F11/0724Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment within a central processing unit [CPU] in a multiprocessor or a multi-core unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0778Dumping, i.e. gathering error/state information after a fault for later diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0787Storage of error reports, e.g. persistent data storage, storage using memory protection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3024Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/835Timestamp
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/86Event-based monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/865Monitoring of software

Abstract

The application provides a data recording method and a data recording apparatus. The method comprises the following steps: acquiring tracking data of a kernel through an embedded tracking macro unit; acquiring time stamp data of a kernel through an embedded tracking macro unit; storing the tracking data and the timestamp data in a buffer area to obtain cache data; outputting interrupt information through a performance monitoring unit; and storing the cache data in the nonvolatile memory according to the interrupt information for data recording. The trace data and the timestamp data of the kernel are respectively obtained by using an embedded trace macro unit in the ARM processor and are stored in the buffer area, and after interrupt information is received, the cache data are timely stored in the nonvolatile memory. By the method, the data flow of the target program during running can be completely recorded, and the user can accurately recover the data flow of the target program by analyzing the data stored in the nonvolatile memory so as to complete fault diagnosis of the target program.

Description

Data recording method and data recording device
Technical Field
The present application relates to the field of processor technologies, and in particular, to a data recording method and a data recording apparatus.
Background
Troubleshooting in real systems is difficult, and the main obstacle is that the information available to developers is limited, which is often insufficient to help developers fix or locate the relevant errors. Furthermore, due to the large difference between the development environment and the production environment, it is not easy to reproduce failures in the production environment in the development environment.
Existing fault diagnosis systems, such as REPT, snorlax, RETracer, credal, POMP, typically utilize modern hardware (e.g., intel processor traces) to record the control flow of a program. For example, control flow is used with a core dump captured when a program crashes to infer data flow. However, because the memory and registers involved in the failure may be overwritten before the core dump is captured, the data recorded by the core dump that ultimately crashed is incomplete.
Disclosure of Invention
The embodiment of the application mainly aims to provide a data recording method and a data recording device, so as to provide a data recording scheme applied to an ARM processor, and control flow data of a program during operation can be completely recorded.
To achieve the above object, a first aspect of an embodiment of the present application proposes a data recording method, where the method includes:
acquiring tracking data of a kernel through an embedded tracking macro unit;
acquiring time stamp data of a kernel through an embedded tracking macro unit;
storing the tracking data and the timestamp data in a buffer area to obtain cache data;
outputting interrupt information through a performance monitoring unit;
and storing the cache data in a nonvolatile memory according to the interrupt information for data recording.
In some embodiments, the obtaining timestamp data of the kernel by the embedded trace macro unit includes:
configuring the embedded trace macrocell to output a timestamp mode;
triggering a tracking cell event of the embedded tracking macro cell by a countdown timer;
outputting the timestamp data according to the tracking cell event.
In some embodiments, said storing said trace data and said timestamp data in a buffer, resulting in cached data, comprises:
and storing the tracking data and the timestamp data in an embedded tracking router to obtain the cache data.
In some embodiments, the outputting, by the performance monitoring unit, interrupt information includes:
detecting the data size of the cache data in the buffer area through the performance monitoring unit;
and if the data size of the cache data is larger than a preset threshold value, outputting the interrupt information.
In some embodiments, the data recording method further comprises:
obtaining the calling type of the current system call;
and selecting a corresponding recording mode according to the calling type to record data.
In some embodiments, the call types include: and selecting a corresponding recording mode according to the calling type to record data, wherein the reading state and the writing state comprise the following steps:
if the calling type is in a reading state, data recording is carried out on all changed data;
and if the calling type is in a writing state, only carrying out data recording on the error code.
In some embodiments, the call types further include: reading content and writing content, wherein the corresponding recording mode is selected according to the calling type to record data, and the method further comprises the following steps:
if the calling type is the read content, performing data recording on part of intercepted data;
and if the calling type is the write content, not recording the data.
In some embodiments, the data recording method further comprises:
configuring the embedded trace macrocell to trace an asynchronous exception mode;
acquiring the event time of the asynchronous event detected by the embedded tracking macro unit;
if a hardware breakpoint on the signal processing program of the asynchronous event is detected, performing core dump to obtain dump data;
and carrying out data recording on the event time and the dump data.
In some embodiments, the data recording method further comprises:
if the current library function is a target library function in the wrapper, performing data recording on the current library function; wherein, in the wrapper, the original function is executed and the execution information is saved.
To achieve the above object, a second aspect of the present application provides a data recording apparatus applied to an ARM processor, the apparatus including:
the first acquisition module is used for acquiring the tracking data of the kernel through the embedded tracking macro unit;
the second acquisition module is used for acquiring the timestamp data of the kernel through the embedded tracking macro unit;
the first storage module is used for storing the tracking data and the timestamp data in a buffer area to obtain cache data;
the interrupt module is used for outputting interrupt information through the performance monitoring unit;
and the second storage module is used for storing the cache data in a nonvolatile memory according to the interrupt information so as to record data.
According to the data recording method and the data recording device, the embedded tracking macro unit in the ARM processor is used for respectively acquiring the tracking data and the timestamp data of the kernel, the tracking data and the timestamp data are stored in the buffer area, and after the interrupt information is received, the cache data are timely stored in the nonvolatile memory. When the object program starts or a new thread is created, the memory information in the corresponding state is transferred to the buffer, and all recorded information is saved to the non-volatile memory. By the method, the data flow of the target program during running can be completely recorded, and the user can accurately recover the data flow of the target program by analyzing the data stored in the nonvolatile memory so as to complete fault diagnosis of the target program.
Drawings
Fig. 1 is a flowchart of a data recording method according to an embodiment of the present application;
FIG. 2 is a flowchart of step S200 in FIG. 1;
FIG. 3 is a flowchart of step S400 of FIG. 1;
FIG. 4 is a flow chart of a data recording method according to another embodiment of the present application;
FIG. 5 is a flow chart of a data recording method according to another embodiment of the present application;
FIG. 6 is a schematic diagram of a data recording method according to an embodiment of the present application;
fig. 7 is a schematic diagram of a data recording apparatus according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
It should be noted that although functional blocks are partitioned in a schematic diagram of an apparatus and a logical order is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the partitioning of blocks in the apparatus or the order in the flowchart. The terms first, second and the like in the description and in the claims, as well as in the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the embodiments of the disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
First, several terms referred to in the present application are resolved:
random Access Memory (RAM): also called main memory, is an internal memory that exchanges data directly with the central processor, usually as a temporary data storage medium for the operating system or other programs that are running. The information stored in the RAM is lost after power is turned off, and is a volatile memory.
Non-Volatile Memory (NVM): refers to a computer memory in which stored data does not disappear when the current is turned off. Data in the dependent Memory can be rewritten at any time when used as a standard, and can be divided into two major products, namely a Read-Only Memory (ROM) and a Flash Memory (Flash Memory).
Embedded Trace Macrocell (ETM): for obtaining trace data for the processor core.
Embedded Trace Buffer (ETB): is a trace receiver that uses a RAM of a certain size to provide on-chip storage for trace data.
Embedded Trace Router (ETR): the functionality is similar to ETB, but allows the user to use any RAM in the device to store trace data.
Performance monitoring Unit (Performance Monitor Unit, PMU): various statistics about the processor and memory may be collected at runtime.
Embodiments of the present application provide a data recording method and a data recording apparatus, and specifically, a data recording method in an embodiment of the present application is described below.
Fig. 1 is an alternative flowchart of a data recording method provided in an embodiment of the present application, and is applied to an ARM processor, where the method in fig. 1 may include, but is not limited to, steps S100 to S500.
S100, acquiring tracking data of a kernel through an embedded tracking macro unit;
s200, acquiring time stamp data of a kernel through an embedded tracking macro unit;
s300, storing the tracking data and the timestamp data in a buffer area to obtain cache data;
s400, outputting interrupt information through a performance monitoring unit;
and S500, storing the cache data in a nonvolatile memory according to the interrupt information for data recording.
According to the data recording method provided by the embodiment of the application, the trace data and the timestamp data of the kernel are respectively obtained by using the embedded trace macro unit in the ARM processor and are stored in the buffer area, and after the interrupt information is received, the cache data are timely stored in the nonvolatile memory. When the target program starts or a new thread is created, the memory information in the corresponding state is transferred to the buffer, and all recorded information is saved to the non-volatile memory. By the method, the data flow of the target program during running can be completely recorded, and the user can accurately recover the data flow of the target program by analyzing the data stored in the nonvolatile memory so as to complete fault diagnosis of the target program.
In step S100, trace data of the kernel is obtained through an embedded trace macro unit in the ARM processor. It can be understood that, in order to obtain a complete and accurate control flow of the ARM processor, the embedded trace macro unit needs to be always in an on state, so as to continuously trace and obtain sufficient trace data. The trace data includes data information obtained by instruction trace and data trace.
It can be understood that in the multi-core system, each core has its own embedded trace macro unit, and each embedded trace macro unit can only trace the instructions executed by its own core, that is, can only obtain the trace data of its own core. Therefore, the execution order of the instructions in a single core can be easily obtained in the recorded data, but the order of the instruction execution among a plurality of cores cannot be obtained, so that the parallel error cannot be detected. For example, a fault due to a race condition cannot be detected. Therefore, the embedded trace macrocell is caused to acquire the time stamp data while acquiring the trace data by step S200.
In some embodiments, referring to fig. 2, in step S200, acquiring timestamp data of a kernel by an embedded trace macrocell includes:
s210, configuring the embedded tracking macro unit into an output timestamp mode;
s220, triggering a tracking unit event of the embedded tracking macro unit through a countdown timer;
and S230, outputting the time stamp data according to the tracking unit event.
First, in step S210, the embedded trace macro unit is configured to be a mode of additionally outputting a time stamp including clock timing information of the CPU. Because the default generation frequency of the embedded tracking macro unit output time stamp is low, the time stamp is output once after a plurality of tracking packets, and the mode is not enough to judge the order when some denser competition conditions occur. The embedded trace macrocell may generate a timestamp whenever a particular trace unit event occurs. Therefore, in step S220, the embedded trace macro unit detects the tracking unit event by using the built-in countdown counter of the embedded trace macro unit as an external source, and outputs the corresponding timestamp data in step S230.
As a specific example, the countdown counter may trigger the tracking unit event when the counter value decreases to 0, and thus, both the initial value and the reload value of the countdown counter may be configured to be 0, so that the tracking unit event may always occur, thereby causing the embedded tracking macro unit to generate the time stamp at the maximum frequency. In some other embodiments, the frequency of generating the time stamp may also be flexibly selected according to the characteristics of the target program.
According to the data recording method, the timestamp data and the tracking data are obtained at the same time, the precedence relationship of the execution instructions in the multiple kernels can be judged in the subsequent fault analysis process, and the subsequent fault analysis is facilitated.
It will be appreciated that the default on-chip buffer for the embedded trace macrocell is an embedded trace buffer for providing on-chip storage for the timestamp data and trace data. The capacity of the embedded trace buffer is usually limited (e.g. 64KB on ARM Juno development board), because the execution speed of the CPU is very fast, the embedded trace buffer is filled within a few seconds, and because the embedded trace buffer cannot cause an interrupt after being filled, in case the embedded trace buffer is filled, the stored timestamp data and trace data are overwritten by new data, which results in data loss.
In some embodiments, the step S300 of storing the trace data and the timestamp data in a buffer to obtain the cache data includes: and storing the tracking data and the timestamp data in the embedded tracking router to obtain cache data. Embodiments of the present application use an embedded trace router as a buffer for an embedded trace macrocell, which allows a user to use any RAM in the device to store trace data and timestamp data. For example, an embedded trace router may be allocated a buffer of up to 4GB of physical memory to provide temporary storage for trace data and timestamp data, which may significantly reduce the frequency of transferring data from the cached data to the non-volatile memory.
In some embodiments, referring to fig. 3, in step S400, outputting, by the performance monitoring unit, interrupt information includes:
s410, detecting the data size of the cache data in the buffer area through a performance monitoring unit;
and S420, if the data size of the cache data is larger than a preset threshold value, outputting interrupt information.
Since no interrupt is triggered after the buffer is filled, the interrupt information is output through the performance monitoring unit in the ARM processor. The performance monitoring unit may collect various statistical information about the processor and the memory during operation, so that the performance monitoring unit detects the data size of the cache data in the buffer in real time through step S410. Since the storage space of the buffer area is fixed, in step S420, when it is detected that the data size of the cache data is greater than the preset threshold, it indicates that the buffer area is about to be filled, and at this time, the performance monitoring unit is controlled to output the interrupt information to indicate the cache data transfer. The size of the preset threshold may be set according to the size of the storage space of the buffer.
Through the step S500, when the interrupt information is detected, the cache data needs to be transferred to the nonvolatile memory for storage, and through such a data recording manner, the trace data and the timestamp data of the target program in the running process can be completely recorded, so as to facilitate the subsequent analysis of the instruction execution process.
To record the impact of non-deterministic events, embedded trace macro-units may be configured for both user space and kernel space, but this approach introduces an unacceptably large amount of overhead (generating GB-level trace data in a few seconds), so for non-deterministic events, the embodiments of the present application consider two main categories of non-deterministic events: system calls and asynchronous events.
In some embodiments, referring to fig. 4, for a system call, the data recording method of the present application further includes:
s600, obtaining the calling type of the current system call;
and S700, selecting a corresponding recording mode according to the calling type to record data.
First, the call type of the current system call is determined in step S600, and then, in step S700, a recording mode corresponding to the call type is selected according to the determined call type to record data. It is understood that the data recording of the embodiment of the present application is to directly record the data in the nonvolatile memory.
In some embodiments, the call types include: reading state and writing state, selecting corresponding recording mode according to calling type to record data, including:
if the calling type is in a reading state, carrying out data recording on all changed data;
and if the calling type is in a writing state, only carrying out data recording on the error code.
When the detected call type is read Status, for example, the function is getpid, etc. RS-Type system calls read information about the state of the system, and the results of these system calls can be shifted through the return value. Therefore, a memory or a register for directly recording the change is required, that is, data recording is performed on all the changed data.
When the detected call type is the write Status (Writing Status), for example, the function is epoll _ create, and the like. WS-type system calls change the state of the system, but do not directly change the memory and registers of the program. Therefore, they are normally ignored unless the call fails and an error code is returned, at which time only the error code is subjected to data logging.
In some embodiments, the call types further include: reading and writing contents, and selecting a corresponding recording mode according to the calling type to record data, and further comprising:
if the calling type is reading content, performing data recording on the partially intercepted data;
and if the calling type is the write content, not recording the data.
When the detected call type is read Content (Reading Content), the function is read, for example. The RC type system call reads contents from an external input. Because the RC-type system call usually has more contents, in view of performance, in the embodiment of the present application, only data recording is performed on part of the intercepted data during recording. For example, truncate the content when recording such a system call, recording only the first 256 bytes. It is understood that the size of the intercepted data may be set arbitrarily.
When the detected call type is write Content (Writing Content), the function is write, for example. WC-type system calls write content to an external source. In this case, since the execution of the target program is not affected, they are ignored and no data recording is performed.
By classifying and detecting the call types, different recording modes can be adopted for different types of system calls, so that the capture overhead is greatly reduced.
In some embodiments, for asynchronous events, referring to fig. 5, the data logging method of the present application further comprises:
s810, configuring the embedded tracking macro unit into a tracking asynchronous abnormal mode;
s820, obtaining the event time of the asynchronous event detected by the embedded tracking macro unit;
s830, if a hardware breakpoint on a signal processing program of an asynchronous event is detected, performing core dump to obtain dump data;
and S840, recording the event time and dump data.
For better handling of asynchronous events, such as interrupt events, it is necessary to determine the time of the asynchronous event in the analysis phase. Thus, the embedded trace macrocell is configured to trace the asynchronous exception mode (IRQ interrupt mode and FIQ fast interrupt mode) through step S810. In this mode, when an asynchronous event occurs, an event time of the asynchronous event may be acquired through step S820.
In order to capture the effect of the signal processing program of the target program, the usage of the signal function is first statically analyzed from the binary system to know in advance which functions belong to the signal processing program, and then a hardware breakpoint is set at the address of the signal processing program. When the program hits a breakpoint through step S830, the contents of registers set by the stack frame and the kernel of the signal processing program are recorded as a core dump to obtain dump data. Finally, the event time and dump data are subjected to data recording through step S840. The content of the recorded signal processing program may be used to assist in recovering the data stream in a subsequent analysis phase.
In some embodiments, the data recording method of the present application further comprises: if the current library function is a target library function in the wrapper, performing data recording on the current library function; wherein, in the wrapper, the original function is executed and the execution information is saved.
In data logging, it is not necessary to log all library functions. Library functions can be divided into two categories: deterministic functions and non-deterministic functions. Deterministic functions (e.g., math.sqrt) do not need to be recorded because their effects can be reproduced at the analysis stage. Some non-deterministic functions (e.g., some portable operating system interfaces) are essentially just wrappers for system calls, so they need to be ignored to avoid duplicate records. Data logging may be required with respect to other non-deterministic functions or developer-specified library functions. Therefore, whether the current library function needs to perform data recording is determined by determining whether the current library function is a target library function in the wrapper. It can be understood that the target library function is some function that needs to be recorded and is originally set by the user.
In the embodiment of the application, a tool is made to automatically generate a wrapper for each target library function, and in the wrapper, the original function is executed and the execution information is saved. When data logging is performed, library function hooks are used to collect their impact on memory and registers. In particular, library functions can be hooked by modifying the relocation process of dynamic linkers in the system. Because the dynamic linker is relocated by searching and relocating addresses of related library functions when the program is loaded, the dynamic linker can be relocated to the hook function by changing the process. According to the embodiment of the application, a large amount of manual engineering can be reduced through an automatic hook process, and library function hooks are more generalized.
Referring to fig. 6, the data recording method of the present application is described in detail below as a specific embodiment. The recording phase of the embodiment of the application mainly comprises three units, namely an embedded tracking macro unit manager, a non-deterministic event catcher and a library function hook. The embedded trace macrocell manager is a kernel module for controlling hardware functions (an embedded trace macrocell and an embedded trace router), obtains cache data including trace data and timestamp data by executing step S100, step S200, and step S300, and transfers the cache data to the nonvolatile memory by executing step S500 when receiving interrupt information output by the performance monitoring unit. The non-deterministic event capturer is configured to perform steps S600, S700, S810 to S840 to record non-deterministic events (system calls and asynchronous events) related to the target program and store the recorded event data into the non-volatile memory. The library function hook checks the library function called by the target program and extracts the relevant information, and every time the target program is started or a new thread is created, the memory information of the target program in the corresponding state is subjected to core dump, and the core dump is saved as an initial snapshot. The information recorded by the three units is saved in a non-volatile memory and transferred to an off-line server when a failure occurs, on which server the analysis phase is subsequently performed.
The analysis stage may employ existing data analysis methods. The control flow constructor reconstructs the control flow using the trace data, the timestamp data, and the binary of the program of the embedded trace macrocell. The embedded trace macrocell records only the addresses of critical (e.g., branch and condition) instructions. Therefore, it is necessary to combine the trace data of the embedded trace macrocell d with the binary of the program to restore the control flow. Specifically, the instructions between every two consecutive addresses tracked by the embedded trace macrocell are obtained from the binary file of the program, and then the entire control flow is reconstructed. For programs running on multiple processors, the instructions associated with data races are not tightly executed with respect to the accuracy of fine-grained timestamps. That is, the instructions are most often located in different blocks, and fine-grained timestamps may determine the order of access of the blocks. Thus by incorporating the time stamp data, the order of data contention can be determined.
The data flow constructor integrates the data records and the control flow information. The data flow provides the state of the memory and registers after each instruction, which is critical for the developer to determine the root cause of the failure. The associated system recovers the data stream through forward and backward analysis. However, backward analysis introduces uncertainty into the data stream because some instructions are irreversible. For example, the instruction EOR r0, r0, r0 clears the register r0, so backward analysis cannot be used to infer the original value stored in r 0. Backward analysis is also often inadequate in long-running applications because the accuracy of data recovery decreases as the trace data increases. Therefore, information collected in the recording stage (such as inconclusive events, library functions and core dump) is used for forward analysis to reconstruct data flow, and meanwhile, the data recording method of the embodiment of the application can support long-term running application programs due to long-term tracking in the recording stage.
For a common instruction type, semantic information of the instruction type is researched, and the state of a memory and a register after each instruction is executed is deduced. For non-deterministic events in the process of execution, the state of the memory and registers is restored by parsing the recorded information (e.g., system calls and signal handlers). And simultaneously, by checking whether the transferred parameters are consistent with the values recovered in the data stream, and applying the recorded register and memory changes.
A program failure root cause detector is used to detect the root cause of a program failure. The root cause of a fault is defined as a set of instructions and inputs that meet the following requirements: participates in the execution of the fault and is most relevant to the occurrence of the fault; the necessary instructions are included and changing the instructions can fix the fault. Specifically, the detector first performs a static alias analysis similar to the previous work on the reconstructed control flow to obtain a point-to-set, which establishes a transitive relationship between memory addresses. Second, a small number of candidate instructions associated with the crashed memory are filtered out by point-to-set and reconstructing the exact memory addresses accessed in the data stream. For parallel programs, the probe extracts features from all possible instructions, eliminates useless features by prediction when there are more normal executions, and re-executes the test of the rest of features one by one in an off-line device, so that one of them can trigger the error to be determined as the root cause of the error. Sequential programs are detected similarly to multi-threaded programs, but the probes do not have to extract and employ features, but instead trace back corrupted data values and corresponding control flows (e.g., input sources, instruction blocks causing data corruption, locations of crashes) from the location of the program crash, and identify them as root causes.
The data recording method of the embodiment of the application effectively reconstructs the accurate control and data stream related to the fault by utilizing the hardware characteristic of the ARM processor, and the method supports the unmodified binary file on the ARM platform and does not need to modify hardware. Applications running in a production environment for a long time can be tracked during data logging, and developers can be provided with enough information to perform root cause analysis.
In some embodiments, referring to fig. 7, the present application further provides a data recording apparatus applied to an ARM processor, where the data recording apparatus includes:
the first acquisition module is used for acquiring the tracking data of the kernel through the embedded tracking macro unit;
the second acquisition module is used for acquiring the timestamp data of the kernel through the embedded tracking macro unit;
the first storage module is used for storing the tracking data and the timestamp data in a buffer area to obtain cache data;
the interrupt module is used for outputting interrupt information through the performance monitoring unit;
and the second storage module is used for storing the cache data in the nonvolatile memory according to the interrupt information so as to record data.
According to the data recording device provided by the embodiment of the application, the embedded tracking macro unit in the ARM processor is used for respectively acquiring the tracking data and the timestamp data of the kernel, the tracking data and the timestamp data are stored in the buffer area, and after interrupt information is received, the cache data are timely stored in the nonvolatile memory. When the target program starts or a new thread is created, the memory information in the corresponding state is transferred to the buffer, and all recorded information is saved to the non-volatile memory. The user can accurately recover the data stream of the target program by analyzing the data stored in the nonvolatile memory, thereby completing the fault diagnosis of the target program. The data recording device in the embodiment of the present application is configured to execute the data recording method in the above embodiment, and the specific execution steps of the data recording device are the same as those of the data recording method in the above embodiment, which are not described in detail here.
The embodiments described in the embodiments of the present application are for more clearly illustrating the technical solutions of the embodiments of the present application, and do not constitute a limitation to the technical solutions provided in the embodiments of the present application, and it is obvious to those skilled in the art that the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems with the evolution of technology and the emergence of new application scenarios.
It will be appreciated by those skilled in the art that the embodiments shown in the figures are not intended to limit the embodiments of the present application and may include more or fewer steps than those shown, or some of the steps may be combined, or different steps may be included.
One of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.
It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiments of the present application have been described in detail with reference to the drawings, but the present application is not limited to the embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present application. Furthermore, the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

Claims (6)

1. The data recording method is applied to an ARM processor, and comprises the following steps:
acquiring tracking data of a kernel through an embedded tracking macro unit;
acquiring time stamp data of a kernel through an embedded tracking macro unit;
storing the tracking data and the timestamp data in a buffer area to obtain cache data;
outputting interrupt information through a performance monitoring unit;
storing the cache data in a nonvolatile memory according to the interrupt information so as to record data;
configuring the embedded trace macrocell to trace an asynchronous exception mode;
acquiring the event time of the asynchronous event detected by the embedded tracking macro unit;
if a hardware breakpoint on the signal processing program of the asynchronous event is detected, performing core dump to obtain dump data;
carrying out data recording on the event time and the dump data;
obtaining the calling type of the current system call;
selecting a corresponding recording mode according to the calling type to record data, wherein the data recording mode comprises the following steps:
if the calling type is in a reading state, data recording is carried out on all changed data;
if the calling type is in a writing state, only carrying out data recording on the error code;
if the calling type is the read content, performing data recording on part of intercepted data;
and if the calling type is the write content, not recording the data.
2. The data recording method according to claim 1, wherein the obtaining time stamp data of the kernel by the embedded trace macro unit comprises:
configuring the embedded trace macrocell to output a timestamp mode;
triggering a tracking cell event of the embedded tracking macrocell by a countdown timer;
outputting the timestamp data according to the tracking cell event.
3. The data recording method according to claim 1, wherein said storing the trace data and the timestamp data in a buffer to obtain buffered data comprises:
and storing the tracking data and the timestamp data in an embedded tracking router to obtain the cache data.
4. The data recording method according to claim 1, wherein the outputting of the interruption information by the performance monitoring unit includes:
detecting the data size of the cache data in the buffer area through the performance monitoring unit;
and if the data size is larger than a preset threshold value, outputting the interrupt information.
5. The data recording method according to any one of claims 1 to 4, characterized in that the method further comprises:
if the current library function is a target library function in the wrapper, performing data recording on the current library function; wherein, in the wrapper, the original function is executed and the execution information is saved.
6. Data recording apparatus, characterized in that, being applied to the ARM processor, the apparatus includes:
the first acquisition module is used for acquiring the tracking data of the kernel through the embedded tracking macro unit;
the second acquisition module is used for acquiring the timestamp data of the kernel through the embedded tracking macro unit;
the first storage module is used for storing the tracking data and the timestamp data in a buffer area to obtain cache data;
the interrupt module is used for outputting interrupt information through the performance monitoring unit;
the second storage module is used for storing the cache data in a nonvolatile memory according to the interrupt information so as to record data;
and is also used to implement the following data recording method:
configuring the embedded trace macrocell to trace an asynchronous exception mode;
acquiring the event time of the asynchronous event detected by the embedded tracking macro unit;
if a hardware breakpoint on the signal processing program of the asynchronous event is detected, performing core dump to obtain dump data;
carrying out data recording on the event time and the dump data;
obtaining the calling type of the current system call;
selecting a corresponding recording mode according to the calling type to record data, wherein the data recording mode comprises the following steps:
if the calling type is in a reading state, data recording is carried out on all changed data;
if the calling type is in a writing state, only carrying out data recording on error codes;
if the calling type is the read content, performing data recording on part of intercepted data;
and if the calling type is the write content, not recording the data.
CN202210881085.3A 2022-07-26 2022-07-26 Data recording method and data recording device Active CN115114117B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210881085.3A CN115114117B (en) 2022-07-26 2022-07-26 Data recording method and data recording device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210881085.3A CN115114117B (en) 2022-07-26 2022-07-26 Data recording method and data recording device

Publications (2)

Publication Number Publication Date
CN115114117A CN115114117A (en) 2022-09-27
CN115114117B true CN115114117B (en) 2022-12-27

Family

ID=83334342

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210881085.3A Active CN115114117B (en) 2022-07-26 2022-07-26 Data recording method and data recording device

Country Status (1)

Country Link
CN (1) CN115114117B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6615371B2 (en) * 2002-03-11 2003-09-02 American Arium Trace reporting method and system
JP2007304972A (en) * 2006-05-12 2007-11-22 Matsushita Electric Ind Co Ltd Microprocessor system
US8341604B2 (en) * 2006-11-15 2012-12-25 Qualcomm Incorporated Embedded trace macrocell for enhanced digital signal processor debugging operations
US20120042212A1 (en) * 2010-08-10 2012-02-16 Gilbert Laurenti Mixed Mode Processor Tracing
CN103631739B (en) * 2012-08-28 2017-07-21 华为技术有限公司 The method for positioning analyzing and embedded system of embedded system
CN109684186B (en) * 2018-12-27 2022-06-10 长安大学 Non-intrusive networked embedded system evaluation device and evaluation method

Also Published As

Publication number Publication date
CN115114117A (en) 2022-09-27

Similar Documents

Publication Publication Date Title
Cui et al. {REPT}: Reverse debugging of failures in deployed software
JP2557180B2 (en) Selective data capture method for software exception conditions
EP3785124B1 (en) Memory validity states in time-travel debugging
US8141053B2 (en) Call stack sampling using a virtual machine
US8276127B2 (en) Devices, methods and computer program products for reverse execution of a simulation
US7991961B1 (en) Low-overhead run-time memory leak detection and recovery
US7886195B2 (en) Apparatus, system, and method of efficiently utilizing hardware resources for a software test
US20130254748A1 (en) Partial Recording of a Computer Program Execution for Replay
US20030145255A1 (en) Hierarchical multi-component trace facility using multiple buffers per component
US10949332B2 (en) Data race analysis based on altering function internal loads during time-travel debugging
US20200257614A1 (en) Automatically identifying and highlighting differences between historic traces
CN110955598B (en) Breakpoint processing method and device for kernel mode program
Natella et al. Emulation of transient software faults for dependability assessment: A case study
CN109542341B (en) Read-write IO monitoring method, device, terminal and computer readable storage medium
US9348723B2 (en) Method, system, and computer program product
CN115114117B (en) Data recording method and data recording device
CN109086162B (en) Memory diagnosis method and device
US11604720B2 (en) Identifying data inconsistencies and data contention based on historic debugging traces
US10956304B2 (en) Dynamic diagnostic code instrumentation over a historic program execution
Tsai et al. Low-overhead run-time memory leak detection and recovery
Arya et al. Semi-automated debugging via binary search through a process lifetime
Xu et al. Fault Injection based Failure Analysis of CentOS, Anolis OS and OpenEuler
Arya et al. FReD: Automated debugging via binary search through a process lifetime
CN116610575A (en) Software testing method and device and electronic equipment
CN111767182A (en) SSD failure analysis method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant