CN106919462B - Method and device for generating fault record of processor - Google Patents

Method and device for generating fault record of processor Download PDF

Info

Publication number
CN106919462B
CN106919462B CN201510992820.8A CN201510992820A CN106919462B CN 106919462 B CN106919462 B CN 106919462B CN 201510992820 A CN201510992820 A CN 201510992820A CN 106919462 B CN106919462 B CN 106919462B
Authority
CN
China
Prior art keywords
instruction address
processing unit
type
control chip
entry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510992820.8A
Other languages
Chinese (zh)
Other versions
CN106919462A (en
Inventor
侯承舜
樊辉
刘洪佳
刘恒
鲁冬杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201510992820.8A priority Critical patent/CN106919462B/en
Priority to PCT/CN2016/098537 priority patent/WO2017107576A1/en
Publication of CN106919462A publication Critical patent/CN106919462A/en
Application granted granted Critical
Publication of CN106919462B publication Critical patent/CN106919462B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0721Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment within a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0787Storage of error reports, e.g. persistent data storage, storage using memory protection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present application relates to the field of computers, and in particular, to a method and an apparatus for generating a processor fault record. In a method for generating a fault record of a processor, a control chip detects that a processing unit in a CPU stops responding, acquires an instruction address in a current program counter PC of the processing unit through a JTAG channel, and creates a first type table entry comprising the instruction address in the current PC, and records the first type table entry in an instruction address table. And when the number of the recorded table entries in the instruction address table reaches a preset value, the control chip triggers the CPU to interrupt. According to the scheme, more information can be provided for processor fault analysis, and therefore the efficiency of fault analysis is improved.

Description

Method and device for generating fault record of processor
Technical Field
The present application relates to the field of computers, and in particular, to a method and an apparatus for generating a processor fault record.
Background
A Central Processing Unit (CPU) is an arithmetic core and a control core of a computer, a server, or other devices having a data processing function. A CPU in one device is connected to a memory (memory) in the device via a bus and processes data by interpreting and executing program instructions in the memory.
Existing CPUs typically include Arithmetic Logic Units (ALUs), registers (registers), Cache memory (caches), and data, control, and status buses (buses) to enable communication between the three.
In a program run by the CPU, a series of instructions are executed in multiple cycles until a predetermined condition in the program is reached. In the actual operation process, due to design defects or unexpected reasons in the actual application environment, the preset condition is not reached for a long time, so that the series of instructions are abnormally executed in a plurality of cycles. The series of instructions are executed in an abnormal number of cycles over a long time, resulting in a phenomenon that the CPU cannot respond to other traffic, which is called a dead cycle (infinitelop).
In order to solve the above problems, a special reset device, such as a watchdog, is usually integrated in the device to monitor the operating condition of the CPU in the device, and when it is determined that the CPU stops responding, the CPU is triggered to interrupt and reset, so as to avoid that the service operation is affected for a long time due to a fault including a CPU dead cycle. For example, the operating condition of the CPU is monitored by a technique such as heartbeat detection. When the CPU does not respond to the reset device after the preset time, and a timer in heartbeat detection overflows, the reset device resets the CPU to enable the CPU to be separated from a dead cycle state.
However, because the current program code is designed, a nested call relationship between a plurality of complex functions often exists, and an instruction corresponding to the instruction address stored in the interrupt time register is not necessarily an instruction in the function causing the occurrence of the dead loop. Therefore, it is difficult to accurately locate the function where the dead loop occurs using the instruction address in the interrupt time register.
The existing CPU dead loop positioning technology is completed by depending on the experience of developers. The developer uses a debugging tool to locate by analyzing, by association and guessing, a function related to an instruction corresponding to an address stored in the interrupt time register. However, many of the dead cycles that occur in practice are sporadic or not occurring every time of operation, and the difficulty of recurrence is large. Therefore, the problem of low efficiency of the existing CPU dead loop positioning technology becomes a problem to be solved urgently in the design process of software products. .
Disclosure of Invention
In view of this, the present application provides a method for generating a processor fault record to provide more information in a CPU when a fault occurs, so as to reduce the difficulty in locating a cause of a CPU loop failure.
The technical scheme provided by the embodiment of the application is as follows.
In a first aspect, a method for generating a processor fault record is provided, which is applied in a hardware platform including a control chip and a central processing unit CPU, and includes:
the control chip detects that one processing unit in the CPU stops responding;
the control chip acquires an instruction address in a current program counter PC of the processing unit through a joint test action group JTAG channel;
the control chip creates a first type table item comprising the instruction address in the current PC and records the first type table item in an instruction address table;
the control chip judges whether the number of the recorded table entries in the instruction address table reaches a preset value, wherein the preset value is more than or equal to 2;
if the number of the recorded table entries in the instruction address table does not reach the preset value, the control chip returns to the step of executing the instruction address in the current program counter PC of the processing unit obtained through the JTAG channel;
and if the number of the table entries recorded in the instruction address table reaches the preset value, the control chip triggers the CPU to interrupt.
Through the scheme, when the processing unit stops responding, the control chip acquires a plurality of instruction addresses stored in the PC of the processing unit through the JTAG channel within a period of time and records the instruction addresses in the instruction address table. The instruction address table reflects the condition that the processing unit runs the program within a period of time after the processing unit stops responding. Compared with the prior art, the method provided by the embodiment of the application can trigger the interruption immediately after the processing unit stops responding and only records one instruction address of the processing unit running at the moment of interruption, more accurately reflects the function and the code interval with the dead loop, and is beneficial to improving the efficiency of CPU fault analysis.
Optionally, before the recording the first type entry in the instruction address table, the method further includes:
the control chip acquires an instruction address in a current function return address register of the processing unit through the JTAG channel;
and the control chip adds the instruction address in the current function return address register into the first type table entry.
By recording the instruction address in the current function return address register of the processing unit in the instruction address table, the instruction address table can more clearly reflect the calling relationship among the functions which are running by the processing unit, and the efficiency of CPU fault analysis is further improved.
Optionally, before the recording the first type entry in the instruction address table, the method further includes:
and the control chip acquires the current time and adds the current time to the first type of table entry.
Optionally, the recording the first type entry in an instruction address table includes:
and the control chip records the first type of table items in an instruction address table according to the sequence of the first type of table items.
Optionally, the step of the control chip returning and executing the step of obtaining the instruction address in the current PC of the processing unit through the JTAG channel includes:
the control chip delays for a time period T1;
after the time period T1 is reached, the control chip returns to execute the step of acquiring the instruction address in the current PC of the processing unit through the JTAG channel.
The control chip can enable the control chip to execute other tasks in the delayed time period T1 by delaying the time period T1, and the control chip is prevented from occupying excessive resources of the control chip due to the steps of circularly reading the pointer of the processing unit PC and recording the pointer in the first type table entry and the like.
Optionally, the detecting, by the control chip, that the processing unit stops responding includes:
and the control chip performs heartbeat detection on the processing unit and determines that the processing unit stops responding.
Optionally, the method further includes that the control chip sequentially reads one first type of entry from the instruction address table according to the sequence of entry storage, and executes, for each read first type of entry:
the control chip queries and obtains a function name and a code line corresponding to the instruction address in the PC, which are included in the first type table item, from a preset corresponding relation between the instruction address in the processing unit and the function name and the code line according to the instruction address in the PC, which is included in the first type table item;
the control chip creates a second type table item, wherein the second type table item comprises a function name and a code line corresponding to an instruction address in the PC, which are contained in the first type table item;
and the control chip records the second type table items in a function operation table according to the sequence of the generation of the second type table items.
The control chip generates a second type table item in the function operation table according to each first type table item in the instruction address table, and the function operation table records the function name and the code line of the program which runs within a period of time after the processing unit stops responding.
Optionally, the method further includes:
the control chip reads a first type of table entry from the instruction address table in sequence according to the sequence of table entry storage, and executes the following steps aiming at each read first type of table entry:
the control chip queries and obtains a function name and a code line corresponding to the instruction address in the PC, which are included in the first type table item, from a preset corresponding relation between the instruction address in the processing unit and the function name and the code line according to the instruction address in the PC, which is included in the first type table item;
the control chip queries and obtains a function name and a code line corresponding to the instruction address in the function return register from a preset corresponding relation between the instruction address in the processing unit and the function name and the code line according to the instruction address in the function return register in the first type of table entry;
the control chip creates a second type table item, wherein the second type table item comprises a function name and a code line corresponding to an instruction address in the PC, which are contained in the first type table item, and a function name and a code line corresponding to an instruction address in a function return register, which are contained in the first type table item;
and the control chip records the second type table items in a function operation table according to the sequence of the generation of the second type table items.
In a second aspect, a method for generating a processor fault record is provided, which is applied in a hardware platform including a control chip and a multi-core CPU, where the multi-core CPU includes a first processing unit and a second processing unit, and the first processing unit and the second processing unit are slave cores in the multi-core CPU,
the method comprises the following steps:
the control chip detects that the first processing unit stops responding;
if the number of the recorded table entries in the instruction address table corresponding to the first processing unit does not reach a first preset value, the control chip acquires the instruction address in the current program counter PC of the first processing unit through a joint test task group (JTAG) channel, and the first preset value is greater than or equal to 2;
the control chip creates a first type table entry, and records the first type table entry in an instruction address table corresponding to the first processing unit, wherein the first type table entry comprises an instruction address in the current PC of the first processing unit;
if the number of the recorded table entries in the instruction address table corresponding to the second processing unit does not reach a second preset value, the control chip acquires the instruction address in the current PC of the second processing unit through the JTAG channel, and the second preset value is greater than or equal to 2;
the control chip creates another first-class table entry and records the another first-class table entry in an instruction address table corresponding to the second processing unit, wherein the another first-class table entry comprises an instruction address in the current PC of the second processing unit;
the control chip judges whether the number of the table entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit reaches a corresponding preset value;
if the number of the entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit does not reach the corresponding preset value, the control chip returns to execute the step of obtaining the instruction address in the current PC of the first processing unit through the JTAG channel;
and if the number of the table entries recorded in the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit reach the corresponding preset values, triggering the multi-core CPU to interrupt by the control unit.
Through the scheme, when the first processing unit stops responding, the control chip acquires a plurality of instruction addresses stored in the PC of the first processing unit through the JTAG channel in a period of time and records the instruction addresses in the instruction address table. The instruction address table reflects a state that the first processing unit runs the program for a period of time after the first processing unit stops responding. And simultaneously, the control chip also records an instruction address stored in a PC of a second processing unit which is in the same multi-core CPU with the first processing unit. Compared with the prior art, the method provided by the application has the advantages that the function and the code interval with the dead loop are reflected more accurately by only recording the running instruction addresses of the first processing unit and the second processing unit at the moment of interruption, and the efficiency of CPU fault analysis is improved.
Optionally, before the recording the first type entry in the instruction address table corresponding to the first processing unit, the method further includes:
the control chip acquires an instruction address in a current function return address register of the first processing unit through the JTAG channel;
the control chip adds the instruction address in the current function return address register of the first processing unit into the first type table entry;
before the recording the another entry of the first type in the instruction address table corresponding to the second processing unit, the method further includes:
the control chip acquires the instruction address in the current function return address register of the second processing unit through the JTAG channel;
and the control chip adds the instruction address in the current function return address register of the second processing unit into the other first-class table entry.
By recording the instruction addresses in the current function return address registers of the first processing unit and the second processing unit in the corresponding instruction address tables, the instruction address tables can more clearly reflect the calling relationship among the functions which are running by the corresponding processing units, and the efficiency of CPU fault analysis is further improved.
Optionally, the control chip sequentially reads one first type entry from the instruction address table corresponding to the first processing unit according to the sequence of entry storage, and executes, for each first type entry read:
the control chip queries and obtains a function name and a code line corresponding to the instruction address in the PC, which are included in the first type table item, from a preset corresponding relation between the instruction address in the first processing unit and the function name and the code line according to the instruction address in the PC, which is included in the first type table item;
the control chip creates a second type table item, wherein the second type table item comprises a function name and a code line corresponding to an instruction address in the PC, which are contained in the first type table item;
and the control chip records the second type table items in a function operation table corresponding to the first processing unit according to the sequence of the generation of the second type table items.
Optionally, the control chip sequentially reads one first type entry from the instruction address table corresponding to the second processing unit according to the sequence of entry storage, and executes, for each read first type entry:
the control chip queries and obtains a function name and a code line corresponding to the instruction address in the PC, which are included in the first type table item, from a preset corresponding relation between the instruction address in the second processing unit and the function name and the code line according to the instruction address in the PC, which is included in the first type table item;
the control chip creates a second type table item, wherein the second type table item comprises a function name and a code line corresponding to an instruction address in the PC, which are contained in the first type table item;
and the control chip records the second type table items in a function operating table corresponding to a second processing unit according to the sequence of the generation of the second type table items.
According to the scheme, the control chip generates the second type of table items in the first function operation table according to the first instruction address table and generates the second type of table items in the second function operation table according to the first type of table items in the second instruction address table, and the function name and the code line of the running program are more intuitively reflected on the running program of the processing unit compared with the instruction address within a period of time after the corresponding processing unit recorded in the function operation table stops responding, so that the CPU fault analysis efficiency is further improved.
In a third aspect, an apparatus for generating a processor fault record is provided, and is applied to a hardware platform including the apparatus and a first central processing unit CPU, where the first CPU includes at least one processing unit therein, and the apparatus communicates with the first CPU through a JTAG channel, and the apparatus includes: the second processor, the memory and the JTAG interface are connected through a bus;
the JTAG interface is used for acquiring an instruction address in a Program Counter (PC) of a processing unit in the first CPU through the JTAG channel and sending the instruction address in the PC to the second processor through the bus;
the second processor is used for reading the program codes stored in the memory and executing the following operations:
detecting a processing unit stop response in the first CPU;
acquiring an instruction address in a current program counter PC of the processing unit through a JTAG channel;
creating a first type table entry comprising the instruction address in the current PC, and recording the first type table entry in an instruction address table;
judging whether the number of the recorded table entries in the instruction address table reaches a preset value, wherein the preset value is more than or equal to 2;
if the number of the recorded table entries in the instruction address table does not reach the preset value, returning to the step of executing the step of acquiring the instruction address in the current program counter PC of the processing unit through the JTAG channel;
and if the number of the recorded table entries in the instruction address table reaches the preset value, triggering the first CPU to interrupt.
Optionally, the JTAG interface is further configured to obtain, through the JTAG channel, an instruction address in a current function return register of the processing unit in the first CPU, and send the instruction address in the current function return register to the second processor through the bus;
the second processor is further configured to, prior to performing the recording of the first type entry in the instruction address table, perform the following:
acquiring an instruction address in a current function return address register of the processing unit through the JTAG channel;
adding the instruction address in the current function return address register in the first type table entry.
Optionally, the step of returning and executing the instruction address in the current program counter PC of the processing unit obtained through the JTAG channel by the second processor includes:
a delay period T1;
and returning to execute the step of acquiring the instruction address in the current program counter PC of the processing unit through the JTAG channel after the time period T1 is reached.
Optionally, the first CPU is a multicore CPU, the processing unit is a main core of the first multicore CPU,
the second processor detecting that the processing unit in the first CPU stops responding, including performing:
and performing heartbeat detection on the main core, and determining that the main core stops responding.
Optionally, the second processor is further configured to, according to the sequence of entry storage, sequentially read one first type entry from the instruction address table, and execute, for each read first type entry:
according to the instruction address in the PC included in the first type table item, inquiring to obtain a function name and a code line corresponding to the instruction address in the PC included in the first type table item from the preset corresponding relation between the instruction address in the processing unit and the function name and the code line;
creating a second type table item, wherein the second type table item comprises a function name and a code line corresponding to an instruction address in the PC, which are included in the first type table item;
and recording the second type of entries in a function operation table according to the sequence of the generation of the second type of entries.
Optionally, the second processor is further configured to, according to the sequence of entry storage, sequentially read one first type entry from the instruction address table, and execute, for each read first type entry:
according to the instruction address in the PC included in the first type table item, inquiring to obtain a function name and a code line corresponding to the instruction address in the PC included in the first type table item from the preset corresponding relation between the instruction address in the processing unit and the function name and the code line;
according to the instruction address in the function return register included in the first type table item, inquiring and obtaining a function name and a code line corresponding to the instruction address in the function return register included in the first type table item from the preset corresponding relation between the instruction address in the processing unit and the function name and the code line;
creating a second type table entry, wherein the second type table entry comprises a function name and a code line corresponding to an instruction address in the PC, which are included in the first type table entry, and a function name and a code line corresponding to an instruction address in a function return register, which are included in the first type table entry;
and recording the second type of entries in a function operation table according to the sequence of the generation of the second type of entries.
Optionally, the JTAG interface is integrated in a monitor chip, the second processor is integrated in a main control chip, the monitor chip communicates with the first CPU through the JTAG channel, and the main control chip communicates with the monitor chip through a bus.
Through the scheme, when the device for generating the fault record of the processor detects that the processing unit stops responding, the JTAG interface is used for obtaining a plurality of instruction addresses stored in the PC of the processing unit within a period of time and recording the instruction addresses in the instruction address table. The instruction address table reflects the condition that the processing unit runs the program within a period of time after the processing unit stops responding. Compared with the prior art, the method has the advantages that the dead loop function and the corresponding code line are repeatedly called when the processing unit enters the dead loop, and therefore, compared with the prior art that the interruption is triggered immediately after the processing unit stops responding, and only the instruction address of the processing unit running at the moment of interruption is recorded, the function and the code interval with the dead loop are accurately reflected, and the method is beneficial to improving the efficiency of CPU fault analysis.
In a fourth aspect, an apparatus for processor fault recording is provided, where the apparatus is applied to a hardware platform including the apparatus and a first multi-core CPU, where the first multi-core CPU includes a first processing unit and a second processing unit, the first processing unit and the second processing unit are slaves of the first multi-core CPU, and the apparatus communicates with the first multi-core CPU through a joint test task group JTAG channel, and the apparatus includes: the second processor, the memory and the JTAG interface are connected through a bus;
the JTAG interface is used for acquiring an instruction address in a Program Counter (PC) of a processing unit in the first multi-core CPU through the JTAG channel and sending the instruction address in the PC to the second processor through the bus;
the second processor is used for reading the program codes stored in the memory and executing the following operations:
detecting that the first processing unit stops responding;
if the number of the recorded table entries in the instruction address table corresponding to the first processing unit does not reach a first preset value, acquiring an instruction address in a current Program Counter (PC) of the first processing unit through a JTAG interface, wherein the first preset value is more than or equal to 2;
creating a first type table entry, wherein the first type table entry is recorded in an instruction address table corresponding to the first processing unit, and the first type table entry comprises an instruction address in the current PC of the first processing unit;
if the number of the recorded table entries in the instruction address table corresponding to the second processing unit does not reach a second preset value, acquiring the current instruction address in the PC of the second processing unit through the JTAG interface, wherein the second preset value is more than or equal to 2;
creating another first-class table entry, and recording the another first-class table entry in an instruction address table corresponding to the second processing unit, wherein the another first-class table entry comprises an instruction address in the current PC of the second processing unit;
judging whether the number of the table entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit reaches a corresponding preset value;
if the number of the entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit does not reach the corresponding preset value, returning to the step of obtaining the instruction address in the current PC of the first processing unit through the JTAG channel;
and triggering the first multi-core CPU to interrupt if the number of the entries recorded in at least one instruction address table in the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit reaches the corresponding preset value.
Optionally, the JTAG interface is further configured to obtain, through a JTAG channel, an instruction address in a current function return register of a processing unit in the first multicore CPU, and send the instruction address in the current function return register to the second processor through the bus;
before the second processor executes the recording of the first-class table entry in the instruction address table corresponding to the first processing unit, the second processor is further configured to execute:
acquiring an instruction address in a current function return address register of the first processing unit through the JTAG interface;
adding the instruction address in the current function return address register of the first processing unit into the first type table entry;
before the second processor executes the recording of the other entry of the first type in the instruction address table corresponding to the second processing unit, the second processor is further configured to execute:
acquiring an instruction address in a current function return address register of the second processing unit through the JTAG interface;
adding the instruction address in the current function return address register of the second processing unit in the other entry of the first type.
Optionally, the detecting, by the second processor, that the first processing unit stops responding includes performing:
receiving indication information sent by a main core in the first multi-core CPU, wherein the indication information carries an identifier of the first processing unit;
and the second processor determines that the first processing unit stops responding according to the indication information.
Optionally, the second processor is further configured to, according to the sequence of entry storage, sequentially read one first-type entry from the instruction address table corresponding to the first processing unit, and execute, for each read first-type entry:
according to the instruction address in the PC included in the first type table item, inquiring to obtain a function name and a code line corresponding to the instruction address in the PC included in the first type table item from the preset corresponding relation between the instruction address in the first processing unit and the function name and the code line;
creating a second type table item, wherein the second type table item comprises a function name and a code line corresponding to an instruction address in the PC, which are included in the first type table item;
and recording the second type of entries in a function operation table corresponding to the first processing unit according to the sequence of the generation of the second type of entries.
Optionally, the second processor is further configured to sequentially read one first-type entry from the instruction address table corresponding to the second processing unit according to the sequence of entry storage, and execute, for each read first-type entry:
according to the instruction address in the PC included in the first type table item, inquiring to obtain a function name and a code line corresponding to the instruction address in the PC included in the first type table item from the preset corresponding relation between the instruction address in the second processing unit and the function name and the code line;
creating a second type table item, wherein the second type table item comprises a function name and a code line corresponding to an instruction address in the PC, which are included in the first type table item;
and recording the second type table items in a function operation table corresponding to a second processing unit according to the sequence of the generation of the second type table items.
Optionally, the JTAG interface is integrated in a monitor chip, the second processor is integrated in a main control chip, the monitor chip communicates with the first multicore CPU through the JTAG channel, and the main control chip communicates with the monitor chip through a bus.
Through the scheme, when the first processing unit stops responding, the fault record generating device acquires a plurality of instruction addresses stored in the PC of the first processing unit through the JTAG channel in a period of time and records the instruction addresses in the instruction address table. The instruction address table reflects a state that the first processing unit runs the program for a period of time after the first processing unit stops responding. And simultaneously, the fault record generating device also records the instruction address stored in the PC of the second processing unit in the same multi-core CPU with the first processing unit. Compared with the prior art that the function of the first processing unit is triggered to be interrupted immediately after the response of the first processing unit is detected, and only the instruction addresses of the first processing unit and the second processing unit which are running at the moment of interruption are recorded, the function and the code interval with the dead loop are reflected more accurately, and the efficiency of CPU fault analysis is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;
FIG. 2 is a flowchart of a method for generating a processor fault record according to an embodiment of the present disclosure;
FIG. 3 is a flow chart of another method for generating a processor fault record provided by an embodiment of the present application;
FIG. 4 is a schematic structural diagram of an apparatus for generating a processor fault record according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of another apparatus for generating a processor fault record according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
Fig. 1 shows a schematic diagram of an application scenario provided in an embodiment of the present application. The network device 100 includes a hardware platform including a CPU101 and a control chip 102. The CPU101 is configured to execute a program stored in a memory in the network device 100, and implement a service function of the network device 100. The control chip 102 is used for monitoring the operating condition of the CPU 101.
For example, the control chip 102 may include a main control chip 1021 and a monitor chip 1022. The main control chip 1021 manages the service operation status of the CPU101, for example, the main control chip 1021 monitors the operation status of the CPU by performing heartbeat detection on the CPU101, and when the main control chip 1021 detects that the CPU101 stops responding, the main control chip triggers the CPU101 to interrupt and restart the CPU101, so that the CPU101 is disengaged from a fault state. The monitor chip 1022 is used to manage the operating conditions of the network device, such as voltage, temperature, etc. The main control chip 1021, the monitor chip 1022, and the CPU101 are connected via buses to communicate with each other.
The CPU101 may be a single-core processor or a multi-core processor. In the case where the CPU101 is a single-core processor, the processing unit 1011 is included; in the case where the CPU101 is a multicore processor, at least a processing unit 1012 is further included.
When the CPU101 is a multi-core processor, the main control chip 1021 can perform heartbeat detection on the processing unit 1011 and the processing unit 1012 respectively, so as to monitor the operating condition of the CPU 101. In the case where the multi-core processor includes a master core (e.g., the processing unit 1011) and a slave core (e.g., the processing unit 1012), the master control chip 1021 may perform heartbeat detection only on the master core, and perform detection on the slave core by the master core. And when the master core detects that the slave core stops responding, the master control chip 1021 is informed.
The main control chip 1021 may be integrated with the CPU101 on a Printed Circuit Board (PCB), or may be integrated on a different PCB. When the main control chip 1021 and the CPU101 are integrated on the same PCB, communication can be performed using an internal bus. When the main control chip 1021 and the CPU101 are not on the same PCB, communication can be performed through an ethernet interface. The monitor chip 1022 is typically integrated with the CPU101 on the same PCB.
The program executed by the CPU101 is composed of a series of instructions, each of which is stored in a memory at a location identified by an instruction address. When a CPU executes an instruction, the instruction address needs to be stored in a register of the CPU, i.e., a Program Counter (PC), and then the instruction address is used to obtain the instruction from the address corresponding to the memory and store the instruction in an Instruction Register (IR) of the CPU, so as to execute the instruction. In some types of CPUs, a function return address register is further included, when an instruction calls a function in a program, an address of the instruction is stored in the function return address register, and when the called function is completed, the CPU continues to execute a next instruction of the instruction address in the function return address register.
In the prior art, after the main control chip 1021 detects that the CPU101 stops responding, the CPU101 is triggered to interrupt immediately. In response to the interrupt, the CPU101 stores the instruction address in a register such as a PC and a function return register inside the CPU101 at the time of the interrupt. After that, the CPU101 can output the instruction address saving in the register for saving the interrupt timing and output. However, because the current program code is often designed with a more complex nested call relationship among a plurality of functions, the instruction corresponding to the instruction address stored in the interrupt time register is not necessarily the instruction in the function that causes the occurrence of the dead loop. Therefore, it is difficult to accurately locate the function where the dead loop occurs using the instruction address in the interrupt time register.
In the method of generating a processor fault record provided herein, the channels for communication between the control chip 102 and the CPU101 include JTAG channels. The JTAG channel is a channel for performing communication by using an interface defined in a related protocol of a Joint Test Action Group (JTAG). For example, In the IEEE1149.1 standard, it is defined that the JTAG interface requires four interfaces, namely, Test Data input (TDI, Test Data In), Test Data output (TDO, Test Data Out), Test Clock (TCK, Test Clock) and Test mode select (TMS, Test mode select). And the control chip reads the information of the register in the processing unit through an interface defined by the IEEE1149.1 standard.
In the case where the control chip 102 includes the main control chip 1021 and the monitor chip 1022, the JTAG channel may be generally implemented by a channel for communication between the CPU101 and the monitor chip 1022. The host chip 1021 typically includes a processor (English). The monitoring chip 1022 obtains an instruction address of a register in the CPU101 through a JTAG channel according to the method provided in the embodiment of the present application, and sends the instruction address to the main control chip 1021 through an internal bus, where a processor in the main control chip 1021 generates a fault record.
The existing chip structure in the distributed network equipment is fully utilized, namely, the monitoring chip of the CPU on the same PCB is utilized to realize a JTAG interface, and the processor in the main control chip is utilized to realize the generation of the instruction address table in the method shown in the figure 2, thereby being beneficial to reducing the realization difficulty of the processor fault record generation method.
Fig. 2 illustrates a method for generating a processor fault record provided by an embodiment of the present application, which is applied to a hardware platform including a control chip and a central processing unit CPU. For example, the hardware platform may be the hardware platform shown in fig. 1, the control chip may be the control chip 102 shown in fig. 1, and the CPU may be the CPU101 shown in fig. 1.
A method for generating a processor fault record provided by the embodiment of the present application will be described in detail below with reference to fig. 2.
S201, the control chip detects that one processing unit in the CPU stops responding.
For example, the control chip performs heartbeat detection on the processing unit, and determines that the processing unit stops responding. For example, the control chip sends a detection message to the processing unit periodically, and if the control chip does not receive a response of the processing unit to the detection message within a preset time, it is determined that the processing unit stops responding.
The processing unit may be the processing unit 1011 shown in fig. 1. In the case where the CPU is a multi-core processor, the processing unit 1011 may be a master core of the CPU or a slave core of the CPU.
S202, the control chip acquires the instruction address in the current program counter PC of the processing unit through a joint test action group JTAG channel.
It should be noted that, to read the information of the register in the processing unit through the JTAG channel, a JTAG module is required to be provided inside the processing unit. The JTAG module is composed of a Test Access Port (TAP) controller and a number of registers. The control chip sends instructions to the TAP controller through a JTAG channel, reads information of internal registers such as the processing unit PC and the like into a register of the JTAG module, and sends the information to the control chip through the JTAG channel. Currently, common advanced devices have modules supporting the JTAG protocol, such as MIPS processors, ARM processors, and the like.
Optionally, in S202, the control chip further obtains an instruction address in a current function return address register of the processing unit through the JTAG channel.
Specifically, in some types of CPUs, a function return address register is included. When one instruction calls one function in a program, the address of the instruction is stored in the function return address register, and when the called function is executed, the CPU continues to execute the next instruction of the instruction address in the function return address register. For example, in a CPU of MIPS architecture, the function return address register may be the R31 register, also known as the Ra register; in the CPU with ARM architecture, the function return address register may be an R14 register, which is also called a Link Register (LR). For convenience of description, in the following embodiments, only the Ra register will be described as an example.
By recording the instruction address in the current function return address register of the processing unit in the instruction address table, the instruction address table can more clearly reflect the calling relationship among the functions which are running by the processing unit, and the efficiency of CPU fault analysis is further improved.
It should be noted that, in S202, the instruction address in the current PC and the instruction address in the current Ra register may be obtained at the same time; or the instruction address in the current PC may be obtained first, and then the instruction address in the current Ra register may be obtained immediately; it is also possible to fetch the instruction address in the current Ra register first and then fetch the instruction address in the current PC immediately thereafter. Since in the process of function call, for example, function a calls function B, the instruction address in the Ra register is the instruction address of the instruction that function a calls function B during the whole operation of function B, i.e. the instruction address in the Ra register does not change during the whole operation of function B. Thus, as will be appreciated by those skilled in the art, fetching the instruction address in the current PC, and fetching the instruction address in the current Ra, is not strictly required to occur at the same time.
S203, the control chip creates a first type table item including the instruction address in the current PC, and records the first type table item in an instruction address table.
For example, the control chip initializes the instruction address table before first acquiring an instruction address in the PC of the processing unit. For example, a certain space is allocated in the memory of the control chip for storing the entries in the instruction address table, and a preset value is set for the number of entries to be recorded. The preset value is greater than or equal to 2.
Optionally, if the control chip in S202 further obtains an instruction address in the current function return address register of the processing unit, the control chip in S203 further adds the instruction address in the current function return address register to the first-type entry.
Optionally, the control chip obtains a current time, and adds the current time to the first type of entry.
It should be noted that, since the current time is recorded in the first-type entries, and the main purpose is to determine the approximate time of the occurrence of the fault, and the time interval created by each first-type entry in the instruction address table, the current time may be the time of acquiring the instruction address in the PC in S202, the time of acquiring the instruction address in the function return register in S202, or the time of creating the first-type entry in S203, and the current time may be acquired from the processing unit through the JTAG channel, or generated by the control chip. The present application does not limit the specific time point and manner of obtaining the current time.
Optionally, the recording the first type entry in an instruction address table includes: and the control chip records the first type of table items in an instruction address table according to the sequence of the first type of table items.
S204, the control chip judges whether the number of the recorded table entries in the instruction address table reaches a preset value, wherein the preset value is more than or equal to 2.
If the number of the recorded table entries in the instruction address table does not reach the preset value, the control chip returns to execute S202 and S203; and if the number of the entries recorded in the instruction address table reaches the preset value, executing S205.
Optionally, if the number of entries recorded in the instruction address table does not reach the preset value, the control chip returns to execute S202 and S203, including: the control chip delays for a time period T1; after the time period T1 is reached, the control chip returns to execute S202 and S203.
The control chip can enable the control chip to execute other tasks in the delayed time period T1 by delaying the time period T1, and the control chip is prevented from occupying excessive resources of the control chip due to the cyclic execution of S202 to S204. For example, the time period T1 may be 1ms, 10ms, 50ms, or 100 ms.
S205, the control chip triggers the CPU interrupt.
For example, the interrupt triggered by the control chip may be a non-maskable interrupt (NMI). The control chip triggers the CPU to restart by triggering interruption so as to break away from the current fault.
Table 1 is an example of the instruction address table.
Table item Instruction address in PC Instruction address in Ra Time of day
1 0x2dc9c0 0xddb2f0ae 6:18:01.050
2 0x151a4d4 0x2dc3d1 6:18:01.100
3 0x2dc9f2 0xddb2f0ae 6:18:01.150
N 0x151a12c 0x2dc3d1 6:18:04.150
TABLE 1
As shown in table 1, at time 6:18:01.050, the control chip obtains the instruction address in the processing unit PC as 0x2dc9c0 and the instruction address in the Ra register as 0xddb2f0ae, creates a first-type entry, i.e., entry 1 in table 1, and records the two instruction addresses in the first-type entry. At the moment of 6:18:01.100, the control chip acquires that the instruction address in the processing unit PC is 0x151a4d4 and the instruction address in the Ra register is 0x2dc3d1, creates another first-class entry, namely entry 2 in table 1, and records the two instruction addresses in the other first-class entry. Similarly, the controller chip also creates entry 3 at time 6:18:01.150 and entry N at time 6:18:04.050, and records the corresponding instruction address.
Through the scheme, when the processing unit stops responding, the control chip acquires a plurality of instruction addresses stored in the PC of the processing unit through the JTAG channel within a period of time and records the instruction addresses in the instruction address table. The instruction address table reflects the condition that the processing unit runs the program within a period of time after the processing unit stops responding. Compared with the prior art, the method provided by the application has the advantages that the processing unit immediately triggers interruption after stopping responding and only records the instruction address of the running processing unit at the moment of interruption, the function and the code interval with the dead loop are reflected more accurately, and the efficiency of CPU fault analysis is improved
Optionally, in S206, the control chip sequentially reads one first type entry from the instruction address table according to the sequence of entry storage, and executes, for each read first type entry: the control chip queries and obtains a function name and a code line corresponding to the instruction address in the PC, which are included in the first type table item, from a preset corresponding relation between the instruction address in the processing unit and the function name and the code line according to the instruction address in the PC, which is included in the first type table item; the control chip creates a second type table item, wherein the second type table item comprises a function name and a code line corresponding to an instruction address in the PC, which are contained in the first type table item; and the control chip records the second type table items in a function operation table according to the sequence of the generation of the second type table items.
For example, the preset correspondence between the instruction address, the function name, and the code line in the processing unit may be obtained by disassembling a program in the processing unit. For example, a program in the processing unit is input into a compiler, and a disassembly file is generated, where the disassembly file includes a name of each function in the program in the processing unit, a code of each function in the assembly language, and a correspondence between a code line in the assembly language and an instruction address in a memory of the processing unit. For example, the compiler may be the objdump software in the GNU compiler suite (GCC). For example, the file for inputting the objdump may be a file for a management plane or a file for a data plane.
Optionally, if the control chip in S203 further adds the instruction address in the current function return address register to the first-class entry, the control chip in S206 further performs, for each read first-class entry: the control chip queries and obtains a function name and a code line corresponding to the instruction address in the function return register from a preset corresponding relation between the instruction address in the processing unit and the function name and the code line according to the instruction address in the function return register in the first type of table entry; and adding the function name and the code line corresponding to the instruction address in the function return register included in the first type of table entry into the second type of table entry.
The control chip generates a second type table item in the function operation table according to each first type table item in the instruction address table, and the function operation table records the function name and the code line of the program which runs within a period of time after the processing unit stops responding.
Table 2 shows an example of the function run table.
Figure BDA0000890758460000121
TABLE 2
As shown in table 2, each second-type entry of the function run table includes six contents, which are: entry number, function to instruction address in PC, code line to instruction address in PC, function to instruction address in Ra, code line to instruction address in Ra, and time. The function corresponding to the instruction address in the Ra, the code line corresponding to the instruction address in the Ra and the time are optional.
Each entry of the second type corresponds to one entry of the first type created in S203. For example, entry i in table 2 corresponds to entry 1 in table 1, entry ii in table 2 corresponds to entry 2 in table 1, entry iii in table 2 corresponds to entry 3 in table 1, and entry M in table 2 corresponds to entry N in table 1. In each second-type table entry, the function corresponding to the instruction address in the PC and the code line corresponding to the instruction address in the PC refer to the function name and the code line obtained by querying the instruction address in the PC recorded in the first-type table entry corresponding to the second-type table entry in the preset corresponding relationship between the instruction address in the processing unit and the function name and the code line, respectively. Similarly, the function corresponding to the instruction address in the Ra and the code line corresponding to the instruction address in the Ra refer to the function name and the code line obtained by querying the instruction address in the Ra recorded in the first type table entry corresponding to the second type table entry in the preset corresponding relationship between the instruction address in the processing unit and the function name and the code line, respectively.
In the case where the function corresponding to the instruction address in Ra and the code line corresponding to the instruction address in Ra are not included in table 2, by analyzing the fault of the CPU in table 2, it can be found that the function B and the function C are alternately executed a plurality of times, and accordingly, it is analyzed that a dead loop may occur in the inter-call process of the function B and the function C.
Further, an engineer analyzing the processor fault may obtain the calling relationship between the function B and the function C by querying the program source code, for example, if the function B calls the function C, the engineer analyzing that the code that the function B calls the function C may be the cause of the dead loop.
In the case where the function corresponding to the instruction address in the Ra and the code line corresponding to the instruction address in the Ra are included in table 2, the call relationship between the functions can be obtained more intuitively. For example, as can be seen from table 2, the instruction corresponding to the 129 th line code of function B repeatedly calls function C. Since the function B and the function C appear alternately, the function C can run normally and return to the function B. Therefore, it is analyzed that a section of instructions before and after the 129 th line code of the function B may be the reason of the dead loop.
If an interrupt is triggered immediately when the control chip detects that the processing unit stops responding, and the instruction address in the PC at the time of the interrupt may be the instruction address in the function C, it is difficult to efficiently analyze the cause of the fault.
Fig. 3 illustrates another method for generating a processor fault record, which is provided in an embodiment of the present application and is applied to a hardware platform including a control chip and a multicore CPU, where the multicore CPU includes a first processing unit and a second processing unit, and the first processing unit and the second processing unit are slave cores in the multicore CPU. For example, the hardware platform may be the hardware platform shown in fig. 1, the control chip may be the control chip 102 shown in fig. 1, the CPU may be the CPU101 shown in fig. 1, the first processing unit may be the first processing unit 1011 shown in fig. 1, and the second processing unit may be the second processing unit 1012 shown in fig. 1.
Another method for generating a processor fault record provided by the embodiment of the present application will be described in detail below with reference to fig. 3.
S301, the control chip detects that the first processing unit stops responding.
For example, the control chip may directly detect the first processing unit and determine that the first processing unit stops responding; or detecting, by a master core in the multicore CPU, whether the first processing unit stops responding, and when the master core detects that the first processing unit stops responding, sending indication information to the control chip, where the indication information carries an identifier of the first processing unit, and the control chip determines, according to the indication information, that the first processing unit stops responding.
S302, if the number of the recorded entries in the instruction address table corresponding to the first processing unit does not reach a first preset value, the control chip obtains the instruction address in the current program counter PC of the first processing unit through a joint test task group (JTAG) channel, and executes S303, wherein the first preset value is greater than or equal to 2. And skipping to execute the step S303 if the number of recorded entries in the instruction address table corresponding to the first processing unit reaches the first preset value.
Optionally, in S302, the control chip further obtains an instruction address in a current function return address register of the first processing unit through the JTAG channel.
S303, the control chip creates a first type table entry, and records the first type table entry in an instruction address table corresponding to the first processing unit, wherein the first type table entry comprises an instruction address in the current PC of the first processing unit.
Optionally, if the control chip in S302 further obtains the instruction address in the current function return address register of the first processing unit through the JTAG channel, the control chip in S303 adds the instruction address in the current function return address register of the first processing unit to the first type entry.
S304, if the number of the recorded entries in the instruction address table corresponding to the second processing unit does not reach a second preset value, the control chip obtains the instruction address in the current PC of the second processing unit through the JTAG channel, and executes S305, wherein the second preset value is greater than or equal to 2.
Optionally, the control chip further obtains an instruction address in a current function return address register of the second processing unit through the JTAG channel.
And skipping S305 if the number of recorded entries in the instruction address table corresponding to the second processing unit reaches the second preset value.
S305, the control chip creates another first-class table entry, and records the another first-class table entry in the instruction address table corresponding to the second processing unit, where the another first-class table entry includes the instruction address in the current PC of the second processing unit.
Optionally, if the control chip in S304 further obtains the instruction address in the current function return address register of the second processing unit through the JTAG channel, the control chip in S305 adds the instruction address in the current function return address register of the second processing unit in the another entry of the first type.
In the present application, S302 and S303 may be performed first, and then S304 and S305 may be performed, or S304 and S305 may be performed first, and then S302 and S303 may be performed.
S306, the control chip judges whether the number of the entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit reaches a corresponding preset value. If the number of entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit does not reach the corresponding preset value, the control chip returns to execute step S302. And if the number of the entries recorded in the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit both reach the corresponding preset values, executing S307.
S307, the control unit triggers the multi-core CPU interrupt. The specific implementation manner is similar to S205.
For example, the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit are similar to those shown in table 1.
Through the scheme, when the first processing unit stops responding, the control chip acquires a plurality of instruction addresses stored in the PC of the first processing unit through the JTAG channel in a period of time and records the instruction addresses in the instruction address table. The instruction address table reflects a state that the first processing unit runs the program for a period of time after the first processing unit stops responding. And simultaneously, the control chip also records an instruction address stored in a PC of a second processing unit which is in the same multi-core CPU with the first processing unit. Compared with the prior art, the method provided by the application has the advantages that the function and the code interval with the dead loop are reflected more accurately by only recording the running instruction addresses of the first processing unit and the second processing unit at the moment of interruption, and the efficiency of CPU fault analysis is improved.
Optionally, the method further comprises S308 and S309.
S308, the control chip reads a first type of table entry from the instruction address table corresponding to the first processing unit in sequence according to the sequence of table entry storage, and executes for each read first type of table entry: and the control chip inquires and obtains the function name and the code line corresponding to the instruction address in the PC in the first type table item from the preset corresponding relation between the instruction address in the first processing unit and the function name and the code line according to the instruction address in the PC in the first type table item. The control chip creates a second type table item, and the second type table item comprises a function name and a code line corresponding to the instruction address in the PC, which are contained in the first type table item. And the control chip records the second type table items in a function operation table corresponding to the first processing unit according to the sequence of the generation of the second type table items.
Optionally, if the control chip adds the instruction address in the current function return address register of the first processing unit to the first-class entry in S303, the control chip further performs, for each read first-class entry in S308: the control chip queries and obtains a function name and a code line corresponding to the instruction address in the function return register from a preset corresponding relation between the instruction address in the first processing unit and the function name and the code line according to the instruction address in the function return register included in the first type table item; and adding the function name and the code line corresponding to the instruction address in the function return register included in the first type of table entry into the second type of table entry.
S309, the control chip sequentially reads a first type of table entry from the instruction address table corresponding to the second processing unit according to the sequence of table entry storage, and executes, for each read first type of table entry: and the control chip inquires and obtains the function name and the code line corresponding to the instruction address in the PC in the first type table item from the preset corresponding relation between the instruction address in the second processing unit and the function name and the code line according to the instruction address in the PC in the first type table item. The control chip creates a second type table item, and the second type table item comprises a function name and a code line corresponding to the instruction address in the PC, which are contained in the first type table item. And the control chip records the second type table items in a function operating table corresponding to a second processing unit according to the sequence of the generation of the second type table items.
Optionally, if the control chip adds the instruction address in the current function return address register of the second processing unit to the first-class entry in S305, the control chip further performs, for each read first-class entry in S309, the following steps: the control chip queries and obtains a function name and a code line corresponding to the instruction address in the function return register from a preset corresponding relation between the instruction address in the second processing unit and the function name and the code line according to the instruction address in the function return register included in the first type of table item; and adding the function name and the code line corresponding to the instruction address in the function return register included in the first type of table entry into the second type of table entry.
It should be noted that, the execution sequence of S308 and S309 is not limited in this application.
By recording the instruction addresses in the current function return address registers of the first processing unit and the second processing unit in the corresponding instruction address tables, the instruction address tables can more clearly reflect the calling relationship among the functions which are running by the corresponding processing units, and the efficiency of CPU fault analysis is further improved.
Table 3 is an example of an instruction address table corresponding to the first processing unit and an instruction address table corresponding to the second processing unit.
Figure BDA0000890758460000151
Figure BDA0000890758460000161
TABLE 3
As shown in table 3, each second-type entry of the function run table includes six contents, which are: entry number, function to instruction address in PC, code line to instruction address in PC, function to instruction address in Ra, code line to instruction address in Ra, and time. The function corresponding to the instruction address in the Ra, the code line corresponding to the instruction address in the Ra and the time are optional.
In the case where the function corresponding to the instruction address in Ra and the code line corresponding to the instruction address in Ra are not included in table 3, by analyzing the fault of the CPU in table 3, it can be found that the first processing unit stops continuing the operation after the operation of the function B reaches the code line 300, and the function C in the second processing unit is repeatedly executed.
Further, an engineer analyzing the CPU failure may obtain that the function C of the second processing unit is called by the function B by querying a program source code, and the function C is always running circularly, and analyze that the stop response of the first processor may be caused by a dead loop of the function C of the second processing unit. In the case where the function corresponding to the instruction address in the Ra and the code line corresponding to the instruction address in the Ra are included in table 3, the call relationship between the functions can be obtained more intuitively.
Fig. 4 is a schematic structural diagram of an apparatus for generating a processor fault record according to an embodiment of the present application. As shown in FIG. 4, the apparatus 400 for generating a processor fault record includes a second processor 410, a memory 420, a JTAG interface 430, and a bus 440, wherein the second processor 410, the memory 420, and the JTAG interface 430 are interconnected by the bus 440.
The apparatus 400 for generating a processor fault record is applied to a hardware platform including the apparatus 400 and a first central processing unit CPU including at least one processing unit, and the apparatus communicates with the first CPU through a JTAG channel.
Memory 420 includes, but is not limited to, Random Access Memory (RAM), Read Only Memory (ROM), erasable programmable read only memory (EPROM or flash memory), or portable read only memory (CD-ROM).
The second processor 410 may be one or more Central Processing Units (CPUs), and in the case that the second processor 410 is one CPU, the CPU may be a single-core CPU or a multi-core CPU.
JTAG interface 430 may be an interface defined in a related protocol that utilizes JTAG. For example, In the IEEE1149.1 standard, it is defined that the JTAG interface requires four interfaces, namely, Test Data input (TDI, Test Data In), Test Data output (TDO, Test Data Out), Test Clock (TCK, Test Clock) and Test mode Select (TMS, Test mode Select).
The JTAG interface 430 is configured to obtain an instruction address in a program counter PC of a processing unit in the first CPU through the JTAG channel, and send the instruction address in the PC to the second processor 410 through the bus 440.
Optionally, the JTAG interface 430 is further configured to obtain, through the JTAG channel, an instruction address in a current function return register of the processing unit in the first CPU, and send the instruction address in the current function return register to the second processor 410 through the bus 440.
The memory 420 is also used to store an instruction address table shown in table 1, a function execution table shown in table 2, and the like.
The second processor 410 is configured to read the program code stored in the memory 420, and perform the following operations.
Detecting a processing unit stop response in the first CPU;
acquiring an instruction address in a current program counter PC of the processing unit through a JTAG channel;
creating a first type table entry comprising the instruction address in the current PC, and recording the first type table entry in an instruction address table;
judging whether the number of the recorded table entries in the instruction address table reaches a preset value, wherein the preset value is more than or equal to 2;
if the number of the recorded table entries in the instruction address table does not reach the preset value, returning to the step of executing the step of acquiring the instruction address in the current program counter PC of the processing unit through the JTAG channel;
and if the number of the recorded table entries in the instruction address table reaches the preset value, triggering the first CPU to interrupt.
Optionally, the second processor 410 is further configured to, before performing the recording of the first type entry in the instruction address table, perform the following operations:
acquiring an instruction address in a current function return address register of the processing unit through the JTAG channel;
adding the instruction address in the current function return address register in the first type table entry.
Optionally, the step of returning and executing the step of obtaining the instruction address in the current program counter PC of the processing unit through the JTAG channel by the second processor 410 includes:
a delay period T1;
and returning to execute the step of acquiring the instruction address in the current program counter PC of the processing unit through the JTAG channel after the time period T1 is reached.
Optionally, the first CPU is a multicore CPU, and the processing unit is a main core of the multicore CPU.
The second processor detecting that the processing unit in the first CPU stops responding, including performing:
and performing heartbeat detection on the main core, and determining that the main core stops responding.
Further, the second processor 410 is further configured to, according to the sequence of storing the table entries, sequentially read one first-type table entry from the instruction address table, and for each read first-type table entry, perform:
according to the instruction address in the PC included in the first type table item, inquiring to obtain a function name and a code line corresponding to the instruction address in the PC included in the first type table item from the preset corresponding relation between the instruction address in the processing unit and the function name and the code line;
creating a second type table item, wherein the second type table item comprises a function name and a code line corresponding to an instruction address in the PC, which are included in the first type table item;
and recording the second type of entries in a function operation table according to the sequence of the generation of the second type of entries.
Further, the second processor 410 is further configured to, according to the sequence of storing the table entries, sequentially read one first-type table entry from the instruction address table, and for each read first-type table entry, perform:
according to the instruction address in the PC included in the first type table item, inquiring to obtain a function name and a code line corresponding to the instruction address in the PC included in the first type table item from the preset corresponding relation between the instruction address in the processing unit and the function name and the code line;
according to the instruction address in the function return register included in the first type table item, inquiring and obtaining a function name and a code line corresponding to the instruction address in the function return register included in the first type table item from the preset corresponding relation between the instruction address in the processing unit and the function name and the code line;
creating a second type table entry, wherein the second type table entry comprises a function name and a code line corresponding to an instruction address in the PC, which are included in the first type table entry, and a function name and a code line corresponding to an instruction address in a function return register, which are included in the first type table entry;
and recording the second type of entries in a function operation table according to the sequence of the generation of the second type of entries.
The apparatus 400 provided in this embodiment may be integrated in the control chip 102 shown in fig. 1, and a hardware platform including the apparatus 400 and the first CPU is integrated in the network device 100 shown in fig. 1, where the first CPU may be the CPU101 shown in fig. 1.
Optionally, the JTAG interface 430 is integrated in a monitoring chip, the second processor 410 is integrated in a main control chip, the monitoring chip communicates with the first CPU through the JTAG channel, and the main control chip communicates with the monitoring chip through a bus 440.
For example, the apparatus 400 is integrated in the control chip 102 shown in fig. 1, the JTAG interface is integrated in the monitor chip 1022 shown in fig. 1, the second processor 410 is integrated in the main control chip 1021 shown in fig. 1, and the first CPU may be the CPU101 shown in fig. 1. The bus 440 between the second processor 410 and the JTAG interface 430 may be implemented by a bus between the master chip 1021 and the monitor chip 1022 shown in fig. 1, and the JTAG channel may be implemented by a bus between the monitor chip 1022 and the CPU101 shown in fig. 1. Please refer to fig. 1 for a detailed description of a connection structure between the main control chip 1011, the monitor chip 1021, and the CPU 101.
The apparatus 400 for generating a processor fault record provided in this embodiment may be applied to the method in the embodiment of fig. 2, and implement the function of the control chip. For other additional functions that the apparatus 400 can implement and the interaction process with the first CPU, please refer to the description of the control chip in the method embodiment, which is not described herein again.
Through the scheme, when the device for generating the fault record of the processor detects that the processing unit stops responding, the JTAG interface is used for obtaining a plurality of instruction addresses stored in the PC of the processing unit within a period of time and recording the instruction addresses in the instruction address table. The instruction address table reflects the condition that the processing unit runs the program within a period of time after the processing unit stops responding. Compared with the prior art, the method provided by the application has the advantages that the processing unit immediately triggers interruption after stopping responding, and only records the instruction address of the running processing unit at the moment of interruption, so that the function and the code interval with the dead loop are more accurately reflected, and the efficiency of CPU fault analysis is improved.
Fig. 5 is a schematic structural diagram of an apparatus for generating a processor fault record according to an embodiment of the present application. As shown in FIG. 5, the apparatus 500 for generating a processor fault record includes a second processor 510, a memory 520, a JTAG interface 530, and a bus 540, wherein the second processor 510, the memory 520, and the JTAG interface 530 are coupled to each other via the bus 540.
The apparatus 500 for generating a processor fault record is applied to a hardware platform including the apparatus 500 and a first multi-core Central Processing Unit (CPU), the first CPU includes a first processing unit and a second processing unit, the first processing unit and the second processing unit are slave cores of the first multi-core CPU, and the apparatus communicates with the first multi-core CPU through a Joint Test Action Group (JTAG) channel.
Memory 520 includes, but is not limited to, Random Access Memory (RAM), Read Only Memory (ROM), erasable programmable read only memory (EPROM or flash memory), or portable read only memory (CD-ROM).
The second processor 510 may be one or more Central Processing Units (CPUs), and in the case that the processor 510 is one CPU, the CPU may be a single-core CPU or a multi-core CPU.
JTAG interface 530 may be an interface defined in a related protocol that employs JTAG. For example, In the IEEE1149.1 standard, it is defined that the JTAG interface requires four interfaces, namely, Test Data input (TDI, Test Data In), Test Data output (TDO, Test Data Out), Test Clock (TCK, Test Clock) and Test mode Select (TMS, Test mode Select).
The JTAG interface 530 is configured to obtain an instruction address in a program counter PC of a processing unit in the first multicore CPU through the JTAG channel, and send the instruction address in the PC to the second processor 510 through the bus 540.
Optionally, the JTAG interface 530 is further configured to obtain, through the JTAG channel, an instruction address in a current function return register of a processing unit in the first multi-core CPU, and send the instruction address in the current function return register to the second processor 510 through the bus 540.
The memory 520 is also used to store an instruction address table shown in table 1, a function operation table shown in table 3, and the like.
The second processor 510 is configured to read the program code stored in the memory 520, and perform the following operations.
Detecting that the first processing unit stops responding;
if the number of the recorded table entries in the instruction address table corresponding to the first processing unit does not reach a first preset value, acquiring an instruction address in a current Program Counter (PC) of the first processing unit through a JTAG interface, wherein the first preset value is more than or equal to 2;
creating a first type table entry, wherein the first type table entry is recorded in an instruction address table corresponding to the first processing unit, and the first type table entry comprises an instruction address in the current PC of the first processing unit;
if the number of the recorded table entries in the instruction address table corresponding to the second processing unit does not reach a second preset value, acquiring the current instruction address in the PC of the second processing unit through the JTAG interface, wherein the second preset value is more than or equal to 2;
creating another first-class table entry, and recording the another first-class table entry in an instruction address table corresponding to the second processing unit, wherein the another first-class table entry comprises an instruction address in the current PC of the second processing unit;
judging whether the number of the table entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit reaches a corresponding preset value;
if the number of the entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit does not reach the corresponding preset value, returning to the step of obtaining the instruction address in the current PC of the first processing unit through the JTAG channel;
and triggering the first multi-core CPU to interrupt if the number of the entries recorded in at least one instruction address table in the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit reaches the corresponding preset value.
Optionally, before the second processor 510 performs the recording of the first-type table entry in the instruction address table corresponding to the first processing unit, the second processor is further configured to perform:
acquiring an instruction address in a current function return address register of the first processing unit through the JTAG interface 530;
adding the instruction address in the current function return address register of the first processing unit into the first type table entry;
before the second processor 510 performs the recording of the another entry of the first type in the instruction address table corresponding to the second processing unit, the second processor is further configured to perform:
acquiring an instruction address in a current function return address register of the second processing unit through the JTAG interface 530;
adding the instruction address in the current function return address register of the second processing unit in the other entry of the first type.
Optionally, the detecting, by the second processor 510, the first processing unit stop responding includes performing:
receiving indication information sent by a main core in the first multi-core CPU, wherein the indication information carries an identifier of the first processing unit;
the second processor 510 determines that the first processing unit stops responding according to the indication information.
Further, the second processor 510 is further configured to, according to the sequence of storing the table entries, sequentially read one first-type table entry from the instruction address table corresponding to the first processing unit, and execute, for each read first-type table entry:
according to the instruction address in the PC included in the first type table item, inquiring to obtain a function name and a code line corresponding to the instruction address in the PC included in the first type table item from the preset corresponding relation between the instruction address in the first processing unit and the function name and the code line;
creating a second type table item, wherein the second type table item comprises a function name and a code line corresponding to an instruction address in the PC, which are included in the first type table item;
and recording the second type of entries in a function operation table corresponding to the first processing unit according to the sequence of the generation of the second type of entries.
Further, the second processor 510 is further configured to, according to the sequence of storing the table entries, sequentially read one first-type table entry from the instruction address table corresponding to the second processing unit, and execute, for each read first-type table entry:
according to the instruction address in the PC included in the first type table item, inquiring to obtain a function name and a code line corresponding to the instruction address in the PC included in the first type table item from the preset corresponding relation between the instruction address in the second processing unit and the function name and the code line;
creating a second type table item, wherein the second type table item comprises a function name and a code line corresponding to an instruction address in the PC, which are included in the first type table item;
and recording the second type table items in a function operation table corresponding to a second processing unit according to the sequence of the generation of the second type table items.
The apparatus 500 provided in this embodiment may be integrated in the control chip 102 shown in fig. 1, and a hardware platform including the apparatus 500 and the first multi-core CPU is integrated in the network device 100 shown in fig. 1, where the first CPU may be the CPU101 shown in fig. 1, the first processing unit may be the first processing unit 1011 shown in fig. 1, and the second processing unit may be the second processing unit 1012 shown in fig. 1.
Optionally, the JTAG interface 530 is integrated in a monitor chip, the second processor 510 is integrated in a main control chip, the monitor chip communicates with the first multi-core CPU through the JTAG channel, and the main control chip communicates with the monitor chip through a bus 540.
For example, the apparatus 500 is integrated in the control chip 102 shown in fig. 1, the JTAG interface is integrated in the monitor chip 1022 shown in fig. 1, the second processor 510 is integrated in the main control chip 1021 shown in fig. 1, and the first CPU may be the CPU101 shown in fig. 1. The bus 540 between the second processor 510 and the JTAG interface 530 may be implemented by a bus between the master chip 1021 and the monitor chip 1022 shown in fig. 1, and the JTAG channel may be implemented by a bus between the monitor chip 1022 and the CPU101 shown in fig. 1. Please refer to fig. 1 for a detailed description of a connection structure between the main control chip 1011, the monitor chip 1021, and the CPU 101.
The apparatus 500 for generating a processor fault record provided in this embodiment may be applied to the method in the embodiment of fig. 3, and implement the function of the control chip. For other additional functions that the apparatus 500 may implement and an interaction process with the first multi-core CPU, please refer to the description of the control chip in the method embodiment, which is not described herein again.
Through the scheme, when the first processing unit stops responding, the fault record generating device acquires a plurality of instruction addresses stored in the PC of the first processing unit through the JTAG channel in a period of time and records the instruction addresses in the instruction address table. The instruction address table reflects a state that the first processing unit runs the program for a period of time after the first processing unit stops responding. And simultaneously, the fault record generating device also records the instruction address stored in the PC of the second processing unit in the same multi-core CPU with the first processing unit. Compared with the prior art, the method provided by the application has the advantages that the function and the code interval with the dead loop are reflected more accurately by only recording the running instruction addresses of the first processing unit and the second processing unit at the moment of interruption, and the efficiency of CPU fault analysis is improved.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (25)

1. A method for generating fault records of a processor is applied to a hardware platform comprising a control chip and a Central Processing Unit (CPU),
the method comprises the following steps:
the control chip detects that one processing unit in the CPU stops responding;
the control chip acquires an instruction address in a current program counter PC of the processing unit through a joint test action group JTAG channel;
the control chip creates a first type table item comprising the instruction address in the current PC and records the first type table item in an instruction address table;
the control chip judges whether the number of the recorded table entries in the instruction address table reaches a preset value, wherein the preset value is more than or equal to 2;
if the number of the recorded table entries in the instruction address table does not reach the preset value, the control chip returns to execute the step of obtaining the instruction address in the current PC of the processing unit through the JTAG channel;
and if the number of the recorded table entries in the instruction address table reaches the preset value, the control chip triggers the CPU to interrupt.
2. The method of claim 1, wherein before recording the first type entries in an instruction address table, further comprising:
the control chip acquires an instruction address in a current function return address register of the processing unit through the JTAG channel;
and the control chip adds the instruction address in the current function return address register into the first type table entry.
3. The method of claim 1, wherein before recording the first type entries in an instruction address table, further comprising: and the control chip acquires the current time and adds the current time to the first type of table entry.
4. The method of claim 1, wherein said recording the entries of the first type in an instruction address table comprises:
and the control chip records the first type of table items in an instruction address table according to the sequence of the first type of table items.
5. The method according to any of claims 1 to 4, wherein said step of said control chip returning to execute said step of obtaining the instruction address in the current PC of said processing unit through JTAG channel comprises:
the control chip delays for a time period T1;
after the time period T1 is reached, the control chip returns to execute the step of acquiring the instruction address in the current PC of the processing unit through the JTAG channel.
6. The method according to any one of claims 1 to 4, wherein the detecting, by the control chip, that the processing unit stops responding comprises: and the control chip performs heartbeat detection on the processing unit and determines that the processing unit stops responding.
7. The method of any of claims 1 to 4, further comprising:
the control chip reads a first type of table entry from the instruction address table in sequence according to the sequence of table entry storage, and executes the following steps aiming at each read first type of table entry:
the control chip queries and obtains a function name and a code line corresponding to the instruction address in the PC, which are included in the first type table item, from a preset corresponding relation between the instruction address in the processing unit and the function name and the code line according to the instruction address in the PC, which is included in the first type table item;
the control chip creates a second type table item, wherein the second type table item comprises a function name and a code line corresponding to an instruction address in the PC, which are contained in the first type table item;
and the control chip records the second type table items in a function operation table according to the sequence of the generation of the second type table items.
8. The method of claim 2, further comprising:
the control chip reads a first type of table entry from the instruction address table in sequence according to the sequence of table entry storage, and executes the following steps aiming at each read first type of table entry:
the control chip queries and obtains a function name and a code line corresponding to the instruction address in the PC, which are included in the first type table item, from a preset corresponding relation between the instruction address in the processing unit and the function name and the code line according to the instruction address in the PC, which is included in the first type table item;
the control chip queries and obtains a function name and a code line corresponding to the instruction address in the function return register from a preset corresponding relation between the instruction address in the processing unit and the function name and the code line according to the instruction address in the function return register in the first type of table entry;
the control chip creates a second type table item, wherein the second type table item comprises a function name and a code line corresponding to an instruction address in the PC, which are contained in the first type table item, and a function name and a code line corresponding to an instruction address in a function return register, which are contained in the first type table item;
and the control chip records the second type table items in a function operation table according to the sequence of the generation of the second type table items.
9. A method of generating a processor fault record,
the method is applied to a hardware platform comprising a control chip and a multi-core CPU, wherein the multi-core CPU comprises a first processing unit and a second processing unit, the first processing unit and the second processing unit are slave cores in the multi-core CPU,
the method comprises the following steps:
the control chip detects that the first processing unit stops responding;
if the number of the recorded items in the instruction address table corresponding to the first processing unit does not reach a first preset value, the control chip acquires the instruction address in the current program counter PC of the first processing unit through a joint test task group (JTAG) channel, wherein the first preset value is greater than or equal to 2, the control chip creates a first type of item and records the first type of item in the instruction address table corresponding to the first processing unit, and the first type of item comprises the instruction address in the current PC of the first processing unit;
if the number of the recorded entries in the instruction address table corresponding to the second processing unit does not reach a second preset value, the control chip acquires the instruction address in the current PC of the second processing unit through the JTAG channel, the second preset value is greater than or equal to 2, the control chip creates another first type of entry and records the another first type of entry in the instruction address table corresponding to the second processing unit, and the another first type of entry comprises the instruction address in the current PC of the second processing unit;
the control chip judges whether the number of the table entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit reaches a corresponding preset value;
if the number of the entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit does not reach the corresponding preset value, the control chip returns to execute the step of obtaining the instruction address in the current PC of the first processing unit through the JTAG channel;
and if the number of the table entries recorded in the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit reaches the corresponding preset value, triggering the multi-core CPU to interrupt by the control chip.
10. The method of claim 9, wherein before recording the first type entries in the instruction address table corresponding to the first processing unit, further comprising:
the control chip acquires an instruction address in a current function return address register of the first processing unit through the JTAG channel;
the control chip adds the instruction address in the current function return address register of the first processing unit into the first type table entry;
before the recording the another entry of the first type in the instruction address table corresponding to the second processing unit, the method further includes:
the control chip acquires the instruction address in the current function return address register of the second processing unit through the JTAG channel;
and the control chip adds the instruction address in the current function return address register of the second processing unit into the other first-class table entry.
11. The method of claim 9, further comprising:
the control chip reads a first type of table entry from the instruction address table corresponding to the first processing unit in sequence according to the sequence of table entry storage, and executes for each read first type of table entry:
the control chip queries and obtains a function name and a code line corresponding to the instruction address in the PC, which are included in the first type table item, from a preset corresponding relation between the instruction address in the first processing unit and the function name and the code line according to the instruction address in the PC, which is included in the first type table item;
the control chip creates a second type table item, wherein the second type table item comprises a function name and a code line corresponding to an instruction address in the PC, which are contained in the first type table item;
and the control chip records the second type table items in a function operation table corresponding to the first processing unit according to the sequence of the generation of the second type table items.
12. The method of any of claims 9 to 11, further comprising:
the control chip reads a first type of table entry from the instruction address table corresponding to the second processing unit in sequence according to the sequence of table entry storage, and executes for each read first type of table entry:
the control chip queries and obtains a function name and a code line corresponding to the instruction address in the PC, which are included in the first type table item, from a preset corresponding relation between the instruction address in the second processing unit and the function name and the code line according to the instruction address in the PC, which is included in the first type table item;
the control chip creates a second type table item, wherein the second type table item comprises a function name and a code line corresponding to an instruction address in the PC, which are contained in the first type table item;
and the control chip records the second type table items in a function operating table corresponding to a second processing unit according to the sequence of the generation of the second type table items.
13. An apparatus for generating a processor fault record, for use in a hardware platform including the apparatus and a first Central Processing Unit (CPU), the first CPU including at least one processing unit therein, the apparatus communicating with the first CPU via a JTAG channel, the apparatus comprising: the second processor, the memory and the JTAG interface are connected through a bus;
the JTAG interface is used for acquiring an instruction address in a Program Counter (PC) of a processing unit in the first CPU through the JTAG channel and sending the instruction address in the PC to the second processor through the bus;
the second processor is used for reading the program codes stored in the memory and executing the following operations:
detecting a processing unit stop response in the first CPU;
acquiring the current instruction address in the PC of the processing unit through a JTAG channel;
creating a first type table entry comprising the instruction address in the current PC, and recording the first type table entry in an instruction address table;
judging whether the number of the recorded table entries in the instruction address table reaches a preset value, wherein the preset value is more than or equal to 2;
if the number of the recorded table entries in the instruction address table does not reach the preset value, returning to execute the step of obtaining the instruction address in the current PC of the processing unit through the JTAG channel;
and if the number of the recorded table entries in the instruction address table reaches the preset value, triggering the first CPU to interrupt.
14. The apparatus of claim 13,
the JTAG interface is further used for acquiring an instruction address in a current function return register of a processing unit in the first CPU through the JTAG channel and sending the instruction address in the current function return register to the second processor through the bus;
the second processor is further configured to, prior to performing the recording of the first type entry in the instruction address table, perform the following:
acquiring an instruction address in a current function return address register of the processing unit through the JTAG channel;
adding the instruction address in the current function return address register in the first type table entry.
15. The apparatus of claim 13, wherein said second processor returns to performing said step of obtaining the instruction address in the current PC of the processing unit via the JTAG channel, including performing:
a delay period T1;
and returning to execute the step of acquiring the instruction address in the current PC of the processing unit through the JTAG channel after the time period T1 is reached.
16. The apparatus of any of claims 13 to 15, wherein the first CPU is a multi-core CPU, the processing unit is a primary core of the first CPU,
the second processor detecting that the processing unit in the first CPU stops responding, including performing:
and performing heartbeat detection on the main core, and determining that the main core stops responding.
17. The apparatus according to any one of claims 13 to 15, wherein the second processor is further configured to, in order of the order of storing the entries, sequentially read one entry of the first type from the instruction address table, and for each read entry of the first type, perform:
according to the instruction address in the PC included in the first type table item, inquiring to obtain a function name and a code line corresponding to the instruction address in the PC included in the first type table item from the preset corresponding relation between the instruction address in the processing unit and the function name and the code line;
creating a second type table item, wherein the second type table item comprises a function name and a code line corresponding to an instruction address in the PC, which are included in the first type table item;
and recording the second type of entries in a function operation table according to the sequence of the generation of the second type of entries.
18. The apparatus of claim 14, wherein the second processor is further configured to, in order of entry storage order, sequentially read one entry of the first type from the instruction address table, and for each read entry of the first type, perform:
according to the instruction address in the PC included in the first type table item, inquiring to obtain a function name and a code line corresponding to the instruction address in the PC included in the first type table item from the preset corresponding relation between the instruction address in the processing unit and the function name and the code line;
according to the instruction address in the function return register included in the first type table item, inquiring and obtaining a function name and a code line corresponding to the instruction address in the function return register included in the first type table item from the preset corresponding relation between the instruction address in the processing unit and the function name and the code line;
creating a second type table entry, wherein the second type table entry comprises a function name and a code line corresponding to an instruction address in the PC, which are included in the first type table entry, and a function name and a code line corresponding to an instruction address in a function return register, which are included in the first type table entry;
and recording the second type of entries in a function operation table according to the sequence of the generation of the second type of entries.
19. The apparatus of any of claims 13 to 15, wherein the JTAG interface is integrated in a monitor chip, the second processor is integrated in a main control chip, the monitor chip communicates with the first CPU through the JTAG channel, and the main control chip communicates with the monitor chip through a bus.
20. An apparatus for generating a processor fault record, applied to a hardware platform including the apparatus and a first multi-core CPU, where the first multi-core CPU includes a first processing unit and a second processing unit, the first processing unit and the second processing unit are slave cores of the first multi-core CPU, and the apparatus communicates with the first multi-core CPU through a joint test task group JTAG channel, the apparatus comprising: the second processor, the memory and the JTAG interface are connected through a bus;
the JTAG interface is used for acquiring an instruction address in a Program Counter (PC) of a processing unit in the first multi-core CPU through the JTAG channel and sending the instruction address in the PC to the second processor through the bus;
the second processor is used for reading the program codes stored in the memory and executing the following operations:
detecting that the first processing unit stops responding;
if the number of the recorded table entries in the instruction address table corresponding to the first processing unit does not reach a first preset value, acquiring the current instruction address in the PC of the first processing unit through a JTAG interface, wherein the first preset value is more than or equal to 2;
creating a first type table entry, wherein the first type table entry is recorded in an instruction address table corresponding to the first processing unit, and the first type table entry comprises an instruction address in the current PC of the first processing unit;
if the number of the recorded table entries in the instruction address table corresponding to the second processing unit does not reach a second preset value, acquiring the current instruction address in the PC of the second processing unit through the JTAG interface, wherein the second preset value is more than or equal to 2;
creating another first-class table entry, and recording the another first-class table entry in an instruction address table corresponding to the second processing unit, wherein the another first-class table entry comprises an instruction address in the current PC of the second processing unit;
judging whether the number of the table entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit reaches a corresponding preset value;
if the number of the entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit does not reach the corresponding preset value, returning to the step of obtaining the instruction address in the current PC of the first processing unit through the JTAG channel;
and triggering the first multi-core CPU to interrupt if the number of the entries recorded in at least one instruction address table in the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit reaches the corresponding preset value.
21. The apparatus of claim 20,
the JTAG interface is further used for acquiring an instruction address in a current function return register of a processing unit in the first multi-core CPU through a JTAG channel and sending the instruction address in the current function return register to the second processor through the bus;
before the second processor executes the recording of the first-class table entry in the instruction address table corresponding to the first processing unit, the second processor is further configured to execute:
acquiring an instruction address in a current function return address register of the first processing unit through the JTAG interface;
adding the instruction address in the current function return address register of the first processing unit into the first type table entry;
before the second processor executes the recording of the other entry of the first type in the instruction address table corresponding to the second processing unit, the second processor is further configured to execute:
acquiring an instruction address in a current function return address register of the second processing unit through the JTAG interface;
adding the instruction address in the current function return address register of the second processing unit in the other entry of the first type.
22. The apparatus of claim 20, wherein the second processor detecting the first processing unit stop response comprises performing:
receiving indication information sent by a main core in the first multi-core CPU, wherein the indication information carries an identifier of the first processing unit;
and the second processor determines that the first processing unit stops responding according to the indication information.
23. The apparatus according to any one of claims 20 to 22, wherein the second processor is further configured to, according to an order of entry storage, sequentially read one entry of the first class from the instruction address table corresponding to the first processing unit, and for each read entry of the first class, perform:
according to the instruction address in the PC included in the first type table item, inquiring to obtain a function name and a code line corresponding to the instruction address in the PC included in the first type table item from the preset corresponding relation between the instruction address in the first processing unit and the function name and the code line;
creating a second type table item, wherein the second type table item comprises a function name and a code line corresponding to an instruction address in the PC, which are included in the first type table item;
and recording the second type of entries in a function operation table corresponding to the first processing unit according to the sequence of the generation of the second type of entries.
24. The apparatus according to any one of claims 20 to 22, wherein the second processor is further configured to, according to an order of entry storage, sequentially read one entry of the first type from the instruction address table corresponding to the second processing unit, and for each read entry of the first type, perform:
according to the instruction address in the PC included in the first type table item, inquiring to obtain a function name and a code line corresponding to the instruction address in the PC included in the first type table item from the preset corresponding relation between the instruction address in the second processing unit and the function name and the code line;
creating a second type table item, wherein the second type table item comprises a function name and a code line corresponding to an instruction address in the PC, which are included in the first type table item;
and recording the second type table items in a function operation table corresponding to a second processing unit according to the sequence of the generation of the second type table items.
25. The apparatus of any of claims 20-22, wherein the JTAG interface is integrated into a monitor chip, wherein the second processor is integrated into a main control chip, wherein the monitor chip communicates with the first multicore CPU via the JTAG channel, and wherein the main control chip communicates with the monitor chip via a bus.
CN201510992820.8A 2015-12-25 2015-12-25 Method and device for generating fault record of processor Active CN106919462B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201510992820.8A CN106919462B (en) 2015-12-25 2015-12-25 Method and device for generating fault record of processor
PCT/CN2016/098537 WO2017107576A1 (en) 2015-12-25 2016-09-09 Method and device for generating fault record of processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510992820.8A CN106919462B (en) 2015-12-25 2015-12-25 Method and device for generating fault record of processor

Publications (2)

Publication Number Publication Date
CN106919462A CN106919462A (en) 2017-07-04
CN106919462B true CN106919462B (en) 2020-04-21

Family

ID=59088920

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510992820.8A Active CN106919462B (en) 2015-12-25 2015-12-25 Method and device for generating fault record of processor

Country Status (2)

Country Link
CN (1) CN106919462B (en)
WO (1) WO2017107576A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109491856B (en) 2017-09-12 2022-08-02 中兴通讯股份有限公司 Bus monitoring system, method and device
CN108873010A (en) * 2018-06-20 2018-11-23 北京亿信华辰软件有限责任公司 A kind of silo stock amount detection terminal
CN112084050A (en) * 2019-06-14 2020-12-15 北京北方华创微电子装备有限公司 Information recording method and system
CN112232027A (en) * 2020-10-19 2021-01-15 腾讯科技(深圳)有限公司 Symbol translation method, device, equipment and computer readable storage medium
CN113220334B (en) * 2021-05-25 2024-04-16 百富计算机技术(深圳)有限公司 Program fault positioning method, terminal equipment and computer readable storage medium
CN113832663B (en) * 2021-09-18 2022-08-16 珠海格力电器股份有限公司 Control chip fault recording method and device and control chip fault reading method
CN114416408A (en) * 2021-12-13 2022-04-29 飞腾信息技术有限公司 Interrupt processing method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1484159A (en) * 2002-09-19 2004-03-24 华为技术有限公司 Method for making centralized control processing by utilizing CPU on system board
CN101131657A (en) * 2006-08-25 2008-02-27 华为技术有限公司 System and method for assisting CPU to drive chips

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060095624A1 (en) * 2004-11-03 2006-05-04 Ashok Raj Retargeting device interrupt destinations
US7594144B2 (en) * 2006-08-14 2009-09-22 International Business Machines Corporation Handling fatal computer hardware errors
CN101149636B (en) * 2007-10-23 2010-07-07 华为技术有限公司 Repositioning system and method
US8195867B2 (en) * 2008-06-06 2012-06-05 International Business Machines Corporation Controlled shut-down of partitions within a shared memory partition data processing system
CN101556551B (en) * 2009-04-15 2011-12-21 杭州华三通信技术有限公司 Hardware acquisition system and method for equipment failure log
CN102214137B (en) * 2010-04-06 2014-01-22 华为技术有限公司 Debugging method and debugging equipment
CN102662889B (en) * 2012-04-24 2016-12-14 华为技术有限公司 Interruption processing method, interrupt control unit and processor

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1484159A (en) * 2002-09-19 2004-03-24 华为技术有限公司 Method for making centralized control processing by utilizing CPU on system board
CN101131657A (en) * 2006-08-25 2008-02-27 华为技术有限公司 System and method for assisting CPU to drive chips

Also Published As

Publication number Publication date
WO2017107576A1 (en) 2017-06-29
CN106919462A (en) 2017-07-04

Similar Documents

Publication Publication Date Title
CN106919462B (en) Method and device for generating fault record of processor
CN101788949B (en) Method and device for realizing embedded type system function monitoring
US9690603B2 (en) Central processing unit, information processing apparatus, and intra-virtual-core register value acquisition method
CN108508874B (en) Method and device for monitoring equipment fault
US7191445B2 (en) Method using embedded real-time analysis components with corresponding real-time operating system software objects
US20140372983A1 (en) Identifying the introduction of a software failure
CN109144873B (en) Linux kernel processing method and device
EP3167371B1 (en) A method for diagnosing power supply failure in a wireless communication device
US11709756B2 (en) Dynamic distributed tracing instrumentation in a microservice architecture
CN101334744B (en) Multiprocessor system fault checking method, system and device
WO2015027403A1 (en) Testing multi-threaded applications
JP2003122599A (en) Computer system, and method of executing and monitoring program in computer system
US20050235010A1 (en) Detecting incorrect versions of files
JP2006164185A (en) Debug device
US9195524B1 (en) Hardware support for performance analysis
CN115756935A (en) Abnormal fault positioning method, device and equipment of embedded software system
US9092563B1 (en) System for discovering bugs using interval algebra query language
CN112740187A (en) Method and system for debugging program
US10242179B1 (en) High-integrity multi-core heterogeneous processing environments
US20180349253A1 (en) Error handling for device programmers and processors
CN116560936A (en) Abnormality monitoring method, coprocessor and computing device
CN106250260A (en) Processor overflows monitoring and adjustment method and device
JP2014232478A (en) Operation monitoring device and operation monitoring method
CN118035030A (en) Uboot-based operating system fault monitoring method
CN118113508A (en) Network card fault risk prediction method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant