CN106610878B - Fault debugging method of double-controller system - Google Patents

Fault debugging method of double-controller system Download PDF

Info

Publication number
CN106610878B
CN106610878B CN201611176450.1A CN201611176450A CN106610878B CN 106610878 B CN106610878 B CN 106610878B CN 201611176450 A CN201611176450 A CN 201611176450A CN 106610878 B CN106610878 B CN 106610878B
Authority
CN
China
Prior art keywords
controller
memory
block device
debug
damon
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611176450.1A
Other languages
Chinese (zh)
Other versions
CN106610878A (en
Inventor
金振成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihai Shengyun Technology Co Ltd
Original Assignee
Beihai Shengyun Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihai Shengyun Technology Co Ltd filed Critical Beihai Shengyun Technology Co Ltd
Priority to CN201611176450.1A priority Critical patent/CN106610878B/en
Publication of CN106610878A publication Critical patent/CN106610878A/en
Application granted granted Critical
Publication of CN106610878B publication Critical patent/CN106610878B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • G06F11/2236Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested to test CPU or processors
    • G06F11/2242Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested to test CPU or processors in multi-processor systems, e.g. one processor becoming the test master

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application discloses a fault debugging method of a dual-controller system, which comprises the following steps: when the dual-controller system is started, two controllers in the dual-controller system respectively allocate a memory area as memory block equipment of the controller; each controller allocates a segment of memory address space as the memory mapping block device of the controller for the memory region of the opposite-end controller, establishes mapping between the segment of memory address space and the memory region of the opposite-end controller through the non-transparent bridge NTB, formats the memory mapping block device of the controller through the file system, and triggers the debug _ damon daemon process of the controller to operate; when any system of the controller A fails, the opposite-end controller B triggers a debug _ damon daemon process of the controller A to execute system debugging operation by using the memory mapping block device of the controller B, and feeds back a corresponding execution result to the controller B. By adopting the invention, the debugging of the double-controller system fault can be realized.

Description

Fault debugging method of double-controller system
Technical Field
The invention relates to a computer application technology, in particular to a fault debugging method of a dual-controller system.
Background
At present, products adopting a dual-controller system generally have no VGA display interface on the product design and cannot be directly connected with a keyboard display. Some products allow for system security to turn off relevant network backend services (such as sshd). Therefore, when the system is abnormal, or the network is abnormal, the system background cannot be logged in, and remote login debugging cannot be performed, the problem that the fault reason cannot be debugged and positioned can be caused. In practical applications, although the network is abnormal, in many cases, the system internal program is still running normally.
At present, a method capable of debugging the fault of the dual-controller system is not provided.
Disclosure of Invention
In view of this, the main objective of the present invention is to provide a method for debugging a failure of a dual-controller system, which can implement debugging of a failure of a dual-controller system.
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
a fault debugging method of a dual-controller system comprises the following steps:
a. when the dual-controller system is started, two controllers in the dual-controller system respectively allocate a memory area as memory block equipment of the controller; each controller allocates a segment of memory address space as the memory mapping block device of the controller for the memory region of the opposite-end controller, establishes mapping between the segment of memory address space and the memory region of the opposite-end controller through the non-transparent bridge NTB, formats the memory mapping block device of the controller through the file system, and triggers the debug _ damon daemon process of the controller to operate;
b. when any system of the controller A fails, the opposite-end controller B triggers a debug _ damon daemon process of the controller A to execute system debugging operation by using the memory mapping block device of the controller B, and feeds back a corresponding execution result to the controller B.
In summary, the method for debugging the fault of the dual-controller system provided by the invention can be used for debugging the fault of the dual-controller system.
Drawings
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an implementation of step 102 in fig. 1.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
The core idea of the invention is as follows: considering that a product adopting a dual-controller system usually has a Non-Transparent Bridge (NTB) function, the invention uses the NTB to debug a failure controller system at one end through a normal controller at the other end in the dual-controller system so as to realize failure debugging of the dual-controller system.
To facilitate a clear understanding of the present invention, prior to describing particular embodiments of the present invention, the non-transparent bridge technology is briefly described as follows:
the non-transparent bridge functions similarly to the transparent bridge, with the main difference being that there are intelligent devices or processors on both sides of the non-transparent bridge, and they have independent address spaces. Furthermore, hosts on one side of a non-transparent bridge cannot see the full address or I/O space on the other side of the bridge. Each processor treats the other side of the non-transparent bridge as an end point (endpoint) and maps it to its own address space.
In the non-transparent bridge environment, hosts on both sides of the bridge are allowed to exchange some state information through scratch registers, doorbell registers, and heartbeat messages. The heartbeat message may be transmitted via the doorbell register. The host at one end can know that the host at the other end has a fault through the receiving condition of the heartbeat message.
Fig. 1 is a schematic flowchart of a method according to an embodiment of the present invention, and as shown in fig. 1, a fault debugging method for a dual controller system implemented by the embodiment mainly includes:
step 101, when a dual-controller system is started, two controllers in the dual-controller system respectively allocate a memory area as memory block equipment of the controller; each controller allocates a segment of memory address space as the memory mapping block device of the controller for the memory region of the opposite-end controller, establishes mapping between the segment of memory address space and the memory region of the opposite-end controller through a non-transparent bridge (NTB), formats the memory mapping block device of the controller through a file system, and triggers the debug _ damon daemon process of the controller to operate.
In this step, after the dual-controller system is started, each controller configures a memory region for itself as the memory block device of the controller, and then each controller configures a memory address space for itself as the memory mapping block device for establishing mapping with the memory block device of the opposite-end controller, so that the memory mapping block device of the opposite-end controller can be accessed by loading the memory mapping block device of the controller.
In this step, the memory space corresponding to the memory mapping block device needs to be formatted by the file system, so that the file reading and writing operation can be performed in the memory space.
After the debug _ damon daemon of each controller runs, the debug _ damon daemon is mainly used for capturing the interrupt sent by the opposite-end controller through the NTB, executing the command required to be executed by the opposite-end controller, storing the execution result in a specified position, and after the execution is finished, sending the interrupt to the opposite-end controller to inform the opposite-end controller to start to acquire the execution result data.
102, when a system of any one of the controllers a fails, the opposite-end controller B triggers the debug _ damon daemon of the controller a to execute a system debugging operation by using the memory mapped block device of the controller B, and feeds back a corresponding execution result to the controller B.
Preferably, this step can be implemented by the following method as shown in fig. 2:
step 1021, when a system corresponding to any one of the controllers a fails, its opposite controller B hangs up the memory mapped block device of the controller B.
In this step, when one controller a fails, its opposite controller B mounts a memory mapping block device of the controller B having a mapping relationship with the memory block device of the controller a, so as to store a debugging operation command to be executed.
Step 1022, the controller B writes a debugging operation command to be executed by the debug _ damon daemon of the controller a into a cmd-formatted file; and unloading the memory mapping block device after the writing is finished.
In this step, the normally running controller B stores the debugging operation command that needs to be executed by the debug _ damon daemon of the controller a in a cmd format file, and the cmd format file is stored in the currently mounted memory mapping block device. In this way, since the memory mapped block device of the controller B has a mapping relationship with the memory block device of the controller a, the controller a can read the cmd format file through the memory block device of the controller a, and execute the debug operation command therein.
Here, after the writing is completed, the memory mapped block device is unloaded to ensure the reliability and consistency of the information on the mapped memory block device.
And 1023, the controller B notifies a debug _ damon daemon of the controller a to execute the debugging operation command in the cmd format file by sending an interrupt instruction.
Step 1024, after the debug _ damon daemon of the controller a captures the interrupt instruction, mounting the memory block device of the controller, executing the debug operation command in the cmd format file, outputting the execution result to a cmd _ result file, and storing the cmd _ result file in the memory block device of the controller.
Step 1025, the debug _ damon daemon of the controller A unloads the memory block device of the controller A; and informing the controller B of acquiring the execution result through an interrupt instruction.
In step 1026, after the debug _ damon daemon of the controller B captures the interrupt instruction, the memory mapped block device of the controller B is mounted, and the cmd _ result file is accessed to obtain the execution result.
According to the technical scheme, the controller system with the fault can be debugged through the normal controller at one end when any controller system in the dual-controller system product has the fault based on the NTB technology, so that the fault debugging of the dual-controller system can be realized.
In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (1)

1. A fault debugging method of a dual-controller system is characterized by comprising the following steps:
a. when the dual-controller system is started, two controllers in the dual-controller system respectively allocate a memory area as memory block equipment of the controller; each controller allocates a segment of memory address space as the memory mapping block device of the controller for the memory region of the opposite-end controller, establishes mapping between the segment of memory address space and the memory region of the opposite-end controller through the non-transparent bridge NTB, formats the memory mapping block device of the controller through the file system, and triggers the debug _ damon daemon process of the controller to operate;
b. when a system of any controller A has a fault, a peer controller B triggers a debug _ damon daemon process of the controller A to execute system debugging operation by using the memory mapping block device of the controller B, and feeds back a corresponding execution result to the controller B;
the step b comprises the following steps:
when a system corresponding to any one controller A fails, the opposite end controller B thereof hangs the memory mapping block device of the controller B;
the controller B writes a debugging operation command which needs to be executed by a debug _ damon daemon of the controller A into a cmd format file; unloading the memory mapping block device after the writing is finished;
the controller B informs a debug _ damon daemon of the controller A to execute a debugging operation command in the cmd format file by sending an interrupt instruction;
after capturing the interrupt instruction, the debug _ damon daemon of the controller a mounts the memory block device of the controller, executes a debugging operation command in the cmd format file, outputs the execution result to a cmd _ result file, and stores the cmd _ result file in the memory block device of the controller;
the debug _ damon daemon of the controller A unloads the memory block device of the controller A; informing the controller B to acquire the execution result through an interrupt instruction;
after the debug _ damon daemon of the controller B captures the interrupt instruction, the memory mapping block device of the controller B is hung, and the cmd _ result file is accessed to obtain the execution result.
CN201611176450.1A 2016-12-19 2016-12-19 Fault debugging method of double-controller system Active CN106610878B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611176450.1A CN106610878B (en) 2016-12-19 2016-12-19 Fault debugging method of double-controller system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611176450.1A CN106610878B (en) 2016-12-19 2016-12-19 Fault debugging method of double-controller system

Publications (2)

Publication Number Publication Date
CN106610878A CN106610878A (en) 2017-05-03
CN106610878B true CN106610878B (en) 2020-02-07

Family

ID=58636076

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611176450.1A Active CN106610878B (en) 2016-12-19 2016-12-19 Fault debugging method of double-controller system

Country Status (1)

Country Link
CN (1) CN106610878B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107861417B (en) * 2017-10-23 2020-08-04 天津市英贝特航天科技有限公司 Rail transit output signal control system
CN112415907B (en) * 2020-11-26 2022-03-29 珠海格力电器股份有限公司 Building equipment remote debugging control method and device and computer equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100565472C (en) * 2007-12-11 2009-12-02 浙江大学 A kind of adjustment method that is applicable to multiprocessor karyonide system chip
CN101639811B (en) * 2009-08-21 2012-09-05 成都市华为赛门铁克科技有限公司 Data writing method, controller and multi-controller system
CN102117241A (en) * 2009-12-30 2011-07-06 华为技术有限公司 Multi-core system debugging method and multi-core system
CN103530241B (en) * 2013-09-24 2016-04-13 创新科存储技术(深圳)有限公司 A kind of dual control memory mirror implementation method of User space

Also Published As

Publication number Publication date
CN106610878A (en) 2017-05-03

Similar Documents

Publication Publication Date Title
JP6333965B2 (en) Technology to track wake clock usage
TWI568217B (en) Method and system of server link state detection and notification
CN111831588A (en) Storage device access method, device and system
US20160342437A1 (en) Data path failover method for sr-iov capable ethernet controller
CN106919494B (en) Method and device for realizing android application log
US9026687B1 (en) Host based enumeration and configuration for computer expansion bus controllers
US20080016405A1 (en) Computer system which controls closing of bus
CN109977061A (en) A kind of interruption processing method and interrupt processing device
JP2014203106A (en) Central processing unit, information processor, and register value acquisition method in virtual core
WO2021072880A1 (en) Method for asynchronously creating internal snapshot of virtual machine, apparatus, system and storage medium
TW201411358A (en) Storage apparatus connected to a host system via a PCIe interface and the method thereof
TWI546660B (en) Debugging system and method
WO2016127600A1 (en) Exception handling method and apparatus
US20190129873A1 (en) I/o request processing method in virtual machine, device and computer readable medium
US10514972B2 (en) Embedding forensic and triage data in memory dumps
CN106610878B (en) Fault debugging method of double-controller system
CN114765051A (en) Memory test method and device, readable storage medium and electronic equipment
US20240086339A1 (en) Systems, methods, and devices for accessing a device operating system over an interconnect
WO2016101177A1 (en) Random access memory detection method of computer device and computer device
CN104750537A (en) Test case execution method and device
TW201441807A (en) SAS expander and its fault detection system
US20060265523A1 (en) Data transfer circuit and data transfer method
WO2022242665A1 (en) Data storage method and related device
CN115964093A (en) System, method and apparatus for accessing device programs on a storage device
CN112015600A (en) Log information processing system, log information processing method and device and switch

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant