WO2019052275A1 - 总线监控系统、方法及装置 - Google Patents

总线监控系统、方法及装置 Download PDF

Info

Publication number
WO2019052275A1
WO2019052275A1 PCT/CN2018/095429 CN2018095429W WO2019052275A1 WO 2019052275 A1 WO2019052275 A1 WO 2019052275A1 CN 2018095429 W CN2018095429 W CN 2018095429W WO 2019052275 A1 WO2019052275 A1 WO 2019052275A1
Authority
WO
WIPO (PCT)
Prior art keywords
bus
monitoring
module
information
node
Prior art date
Application number
PCT/CN2018/095429
Other languages
English (en)
French (fr)
Inventor
王锦榕
余中云
包晓瑜
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to US16/627,720 priority Critical patent/US11093361B2/en
Priority to EP18856083.3A priority patent/EP3683682B1/en
Priority to JP2019572374A priority patent/JP2020525944A/ja
Publication of WO2019052275A1 publication Critical patent/WO2019052275A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3027Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0772Means for error signaling, e.g. using interrupts, exception flags, dedicated error registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0787Storage of error reports, e.g. persistent data storage, storage using memory protection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1012Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices using codes or arrangements adapted for a specific type of error
    • G06F11/1016Error in accessing a memory location, i.e. addressing error
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1405Saving, restoring, recovering or retrying at machine instruction level
    • G06F11/141Saving, restoring, recovering or retrying at machine instruction level for bus or memory accesses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • G06F11/221Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested to test buses, lines or interfaces, e.g. stuck-at or open line faults
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/349Performance evaluation by tracing or monitoring for interfaces, buses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3636Software debugging by tracing the execution of the program
    • G06F11/364Software debugging by tracing the execution of the program tracing values on a bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/70Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer
    • G06F21/82Protecting input, output or interconnection devices
    • G06F21/85Protecting input, output or interconnection devices interconnection devices, e.g. bus-connected or in-line devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/86Event-based monitoring

Definitions

  • the present disclosure relates to the field of processors, and in particular to a bus monitoring system, method and apparatus.
  • FIG. 1 is a diagram of a bus monitoring device in the related art of the present disclosure. In this way, if a bus exception occurs, the processor or peripheral cannot be accessed through the bus, and the bus monitoring information cannot be obtained, making it difficult to locate the fault.
  • the embodiments of the present disclosure provide a bus monitoring system, method and device to solve at least the technical problem that the bus monitoring information cannot be acquired when the bus is abnormal in the related art.
  • a bus monitoring system including: a bus node; a bus monitoring module for monitoring a first bus where the bus monitoring module is located, generating monitoring information; and an information storage module for passing
  • the second bus acquires the monitoring information from the bus monitoring module; the first bus is configured to connect a master device and a secondary device of the bus node.
  • the second bus is configured to connect the information storage module and the bus monitoring module; wherein the second bus is independent of the first bus.
  • the second bus is further used for at least one of: backing up data of a chip where the bus node is located, expanding a path, and expanding the use for a specified purpose.
  • the system further includes: a chip resetting module, configured to perform a reset operation on the chip where the bus node is located when detecting that the monitoring information is saved, and a fault analysis module, configured to Monitor information for failure analysis.
  • a chip resetting module configured to perform a reset operation on the chip where the bus node is located when detecting that the monitoring information is saved
  • a fault analysis module configured to Monitor information for failure analysis.
  • the system further includes: a storage module, configured to save the monitoring information.
  • the second bus is further configured to connect the information storage module and the storage module.
  • a bus monitoring method including: monitoring a first bus where a bus monitoring module is located, generating monitoring information; and acquiring the monitoring from the bus monitoring module by using a second bus Information; wherein the second bus is independent of the first bus.
  • obtaining the monitoring information from the bus monitoring module by using the second bus includes: acquiring the monitoring information from the bus monitoring module by using the second bus according to the event trigger; acquiring the monitoring information from the bus monitoring module by using the second bus according to the query status. Monitoring information; obtaining monitoring information from the bus monitoring module through the second bus according to continuous storage.
  • the method further includes: saving the monitoring information.
  • the method further includes: performing a reset operation on the chip where the bus node is located; and performing fault analysis according to the monitoring information after the resetting operation is completed.
  • performing a reset operation on the chip where the bus node is located includes:
  • a reset operation is performed on the chip in which the bus node is located.
  • performing fault analysis according to the monitoring information includes: inferring an access model of the chip by using the monitoring information; and locating a fault type according to the access model.
  • performing fault analysis according to the monitoring information includes: traversing all the master devices to read a bus node number directly connected to the current master device; reading historical access information of the corresponding bus node according to the bus node number; analyzing the The historical access information detects whether the current primary device initiates access; when the access is initiated, determines whether the accessed address is within a legal range; and when the accessed address is not within the legal range, records at least one of the following node information : the master device number, the access address, and the access attribute; and locate the illegally accessed master device according to the node information.
  • the method further includes: determining a faulty bus node; detecting, when the current bus node initiates an access, whether the access address of the faulty bus node is legal; When the access address of the bus node is invalid, access to the faulty bus node is prohibited.
  • monitoring the first bus where the bus monitoring module is located and generating monitoring information, including: monitoring access information of the bus node by using the first bus, and saving the access information to the bus monitoring module to obtain monitoring of the bus node. information.
  • obtaining the monitoring information from the bus monitoring module by using the second bus includes: acquiring the monitoring information from the bus monitoring module by using a second bus when the current access information represents the abnormality of the bus node. And acquiring, when the current access information indicates that the bus node is normal, acquiring the monitoring information from the bus monitoring module by using a second bus.
  • a bus monitoring apparatus including: a monitoring module, configured to monitor a first bus where a bus monitoring module is located, to generate monitoring information; and an acquiring module, configured to use the second bus The monitoring information is obtained by the bus monitoring module; wherein the second bus is independent of the first bus.
  • the obtaining module includes one of the following: a first acquiring unit, configured to obtain, by the second bus, the monitoring information from the bus monitoring module according to the event; and a second acquiring unit, configured to pass the second bus according to the query status.
  • the monitoring information is obtained from the bus monitoring module, and the third acquiring unit is configured to acquire the monitoring information from the bus monitoring module through the second bus according to the continuous storage.
  • the device further includes: a reset module, configured to perform a reset operation on the chip where the bus node is located after the storage module saves the monitoring information; and an analysis module, configured to perform, according to the monitoring information, Failure analysis.
  • a reset module configured to perform a reset operation on the chip where the bus node is located after the storage module saves the monitoring information
  • an analysis module configured to perform, according to the monitoring information, Failure analysis.
  • the device further includes: a determining module, configured to determine a faulty bus node after the analyzing module performs fault analysis according to the monitoring information; and a detecting module, configured to detect when the current bus node initiates access Whether the access address of the faulty bus node is legal; the processing module is configured to prohibit access to the faulty bus node when the access address of the faulty bus node is invalid.
  • a determining module configured to determine a faulty bus node after the analyzing module performs fault analysis according to the monitoring information
  • a detecting module configured to detect when the current bus node initiates access Whether the access address of the faulty bus node is legal
  • the processing module is configured to prohibit access to the faulty bus node when the access address of the faulty bus node is invalid.
  • the device further includes: a storage module, configured to save the monitoring information.
  • a storage medium is also provided.
  • the storage medium is arranged to store program code for performing the following steps:
  • the monitoring information is obtained from the bus monitoring module via a second bus.
  • the present disclosure by adding a second bus on the basis of the first bus and using the second bus to acquire monitoring information, it is ensured that the bus monitoring information can also be obtained in the case of a bus abnormality, and the related art is in the case of a bus abnormality.
  • the technical problem of the bus monitoring information cannot be obtained, and the monitoring efficiency of the bus node is improved.
  • FIG. 1 is a diagram of a bus monitoring device in the related art of the present disclosure
  • FIG. 2 is a flow chart of a bus monitoring method in accordance with an embodiment of the present disclosure
  • FIG. 3 is a structural block diagram of a bus monitoring device according to an embodiment of the present disclosure.
  • FIG. 4 is a structural block diagram of a bus monitoring system in accordance with an embodiment of the present disclosure.
  • FIG. 5 is a block diagram of a bus monitoring device in accordance with an embodiment of the present disclosure.
  • FIG. 6 is a structural diagram of an information storage module according to an embodiment of the present disclosure.
  • FIG. 8 is a flowchart of an abnormality of positioning a bus by using an event method according to an embodiment of the present disclosure
  • FIG. 9 is a flowchart of an abnormality of positioning a bus by using a query method according to an embodiment of the present disclosure.
  • FIG. 10 is a flowchart of an abnormality of positioning a bus by using a RAM to store information in an embodiment of the present disclosure
  • FIG. 11 is a flow chart of positioning a bus abnormality in a continuous storage manner according to an embodiment of the present disclosure.
  • FIG. 2 is a flowchart of a bus monitoring method according to an embodiment of the present disclosure. As shown in FIG. 2, the process includes the following steps:
  • Step S202 monitoring the first bus where the bus monitoring module is located, and generating monitoring information
  • Step S204 Obtain monitoring information from the bus monitoring module through the second bus.
  • the execution body of the foregoing steps may be a bus node, specifically: a processor, a chip, a bus management device, etc., but is not limited thereto.
  • the solution of the embodiment further includes: Step S206, saving the monitoring information.
  • obtaining the monitoring information from the bus monitoring module through the second bus includes one of the following:
  • the monitoring information is acquired from the bus monitoring module through the second bus in accordance with the continuous storage.
  • the saving of the monitoring information may be, but is not limited to, saving the monitoring information in the local ROM space; storing the monitoring information in the local RAM space; storing the monitoring information in the remote ROM space; and storing the monitoring information in the remote RAM space.
  • the ROM may be Flash (Flash)
  • the RAM may be DDR (Double Data Rate SDRAM).
  • the method further includes:
  • performing a reset operation on the chip where the bus node is located includes: performing a reset operation on the chip where the bus node is located when detecting the completion of the monitoring information saving completion.
  • the failure analysis based on the monitoring information includes:
  • the failure analysis based on the monitoring information includes:
  • the method further includes:
  • obtaining the monitoring information from the bus monitoring module by using the second bus includes: acquiring the monitoring information from the bus monitoring module by using the second bus when the current access information indicates that the bus node is abnormal; and when the current access information indicates that the bus node is normal Obtaining monitoring information from the bus monitoring module through the second bus.
  • monitoring the first bus where the bus monitoring module is located and generating monitoring information, including: monitoring access information of the bus node by using the first bus, saving the access information to the bus monitoring module to obtain monitoring of the bus node information.
  • a bus monitoring device and a system are provided to implement the above-mentioned embodiments and preferred embodiments, which are not described again.
  • the term “module” may implement a combination of software and/or hardware of a predetermined function.
  • the devices described in the following embodiments are preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • FIG. 3 is a structural block diagram of a bus monitoring apparatus according to an embodiment of the present disclosure.
  • the apparatus may be applied to a chip or the like, particularly a multi-processor chip. As shown in FIG. 3, the apparatus includes:
  • the monitoring module 30 is configured to monitor the first bus where the bus monitoring module is located, and generate monitoring information
  • the obtaining module 32 is configured to obtain monitoring information from the bus monitoring module by using the second bus.
  • the device further includes: a storage module, configured to save the monitoring information.
  • the obtaining module includes at least one of the following: a first obtaining unit, configured to acquire monitoring information from the bus monitoring module by using a second bus according to an event trigger, and a second acquiring unit, configured to use the second bus according to the query status.
  • the monitoring information is obtained in the bus monitoring module, and the third obtaining unit is configured to acquire the monitoring information from the bus monitoring module through the second bus according to the continuous storage.
  • the device further includes: a reset module, configured to perform a reset operation on the chip where the bus node is located after the storage module saves the monitoring information; and an analysis module, configured to perform fault analysis according to the monitoring information.
  • a reset module configured to perform a reset operation on the chip where the bus node is located after the storage module saves the monitoring information
  • an analysis module configured to perform fault analysis according to the monitoring information.
  • the device further includes: a determining module, configured to determine a faulty bus node after the analyzing module performs fault analysis according to the monitoring information; and a detecting module, configured to detect, when the current bus node initiates the access, whether the access address of the faulty bus node is detected Legal; processing module, for accessing the faulty bus node when the access address of the faulty bus node is invalid.
  • a determining module configured to determine a faulty bus node after the analyzing module performs fault analysis according to the monitoring information
  • a detecting module configured to detect, when the current bus node initiates the access, whether the access address of the faulty bus node is detected Legal
  • processing module for accessing the faulty bus node when the access address of the faulty bus node is invalid.
  • FIG. 4 is a structural block diagram of a bus monitoring system according to an embodiment of the present disclosure, as shown in FIG. 4, including:
  • the bus monitoring module 42 is configured to monitor the first bus where the bus monitoring module is located, and generate monitoring information
  • the information storage module 44 is configured to obtain monitoring information from the bus monitoring module by using the second bus;
  • the first bus 48 is used to connect the master device and the slave device of the bus node.
  • a second bus 50 configured to connect the information storage module and the bus monitoring module, and connect the information storage module and the storage module;
  • the second bus is independent of the first bus.
  • system further includes: a storage module 46, configured to save the monitoring information; and optionally, the second bus is further configured to connect the information storage module and the storage module.
  • the second bus is further used for at least one of: backing up data of the chip where the bus node is located, expanding the path, and expanding the use for the specified purpose.
  • the system further includes: a chip reset module, configured to perform a reset operation on the chip where the bus node is located when detecting that the monitoring information is saved, and a fault analysis module, configured to perform fault analysis according to the monitoring information.
  • a chip reset module configured to perform a reset operation on the chip where the bus node is located when detecting that the monitoring information is saved
  • a fault analysis module configured to perform fault analysis according to the monitoring information.
  • each of the above modules may be implemented by software or hardware.
  • the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the above modules are in any combination.
  • the forms are located in different processors.
  • This embodiment is an additional embodiment according to the present disclosure, which is used to supplement and elaborate the present application in combination with specific examples and scenarios:
  • the embodiment provides a method for storing bus node information by adding a second bus and an information storage module on the basis of the bus monitoring module in the multiprocessor chip, and ensuring that the bus monitoring information can be acquired when the bus abnormality occurs. Perform anomaly analysis.
  • the solution of this embodiment ensures that the monitoring information of the bus node can be acquired under the abnormality of the bus, and the fault is quickly and accurately located.
  • a multiprocessor chip bus monitoring device includes the following modules: a bus monitoring module, an information storage module, a memory, a chip reset module, and a fault analysis module.
  • FIG. 5 is a block diagram of a bus monitoring device according to an embodiment of the present disclosure.
  • the information storage module of the disclosed device reads the monitoring information from the bus monitoring module (monitoring the first bus) and stores the information into the memory through the second bus (shown in FIG. 4); after the storage is completed, notifying the chip reset module to reset the chip;
  • the analysis module obtains monitoring information from the memory for abnormal analysis.
  • the first bus (the bus 0 to the bus N in FIG. 4 belong to the first bus) is a path connecting the master device (master device) and the slave device (secondary device).
  • the master device such as a processor or a peripheral device accesses a slave device such as a DDR or a bus monitoring unit through the first bus.
  • the second bus is used for the information storage module to access the bus monitoring module and the memory, and the monitoring information of the bus monitoring module is stored into the memory through the second bus.
  • the second bus is independent of the first bus, that is, when the first bus is abnormal, the second bus can also access the bus monitoring module;
  • the second bus can also be used as a backup and spread of the chip.
  • the bus node is a plurality of bus (Bus 0 to Bus N) ports, and the bus monitoring module can monitor one or more bus nodes.
  • the bus monitoring module is responsible for capturing bus (bus 0 to bus N) access information of the bus node; the bus monitoring module includes a configuration part, a monitoring part and an event generating part.
  • the configuration part receives the configuration information, and the monitoring part performs bus monitoring according to the configuration, and writes the captured monitoring information to the register space of the bus monitoring module; the event generating part generates an event and sets an abnormal state when the abnormality occurs;
  • the information storage module is responsible for bus monitoring information storage and memory space management; the information storage module includes a configuration part, a read/write operation part, and an event processing part.
  • 6 is a structural diagram of an information storage module according to an embodiment of the present disclosure, and FIG. 6 is a structural diagram of an information storage module.
  • the information storage module uses a second bus, which is independent of the first bus where the bus monitoring module is located, and ensures that the information storage module can still perform operations when the first bus is abnormal;
  • the configuration part mainly configures an operation mode, including query, event triggering, and continuous storage.
  • the query mode refers to querying the status of the bus monitoring module; the event triggering refers to responding to the bus monitoring module abnormal event; and the continuous triggering mode refers to the information storage module continuously storing the access information captured by the bus monitoring module.
  • the read and write operation part writes the monitoring information of the bus monitoring module to the memory, and the operation mode is controlled by the configuration.
  • the read and write operation part manages the memory storage space according to the bus node, and ensures the correctness of the information of each bus node.
  • the event processing part mainly completes an abnormal event sent by the receiving bus monitoring module and generates a read/write completion event and a completion flag after the storage operation is completed.
  • the information storage module can complete information storage of one or more bus monitoring modules
  • the memory may be a ROM/RAM space in the monitoring bus chip or an off-chip RAM/ROM space for storing monitoring information.
  • the chip reset module performs a chip reset operation. Mainly responsible for querying the execution status of the information storage module or receiving the completion event transmitted by the information storage module, and performing a chip reset operation according to the status or event;
  • the fault analysis module is responsible for problem location analysis. According to the access information of the bus monitoring node in the memory, the chip access model when the bus is abnormal is inferred to find the abnormal access.
  • the first step the bus monitoring module captures the first bus access information.
  • the second step the information storage module acquires the monitoring information from the bus monitoring module through the second bus according to different operation modes, and stores the monitoring information in the memory.
  • the operation mode includes event triggering, query status, and continuous storage
  • the memory may be an on-chip/outside ROM, such as Flash, or an on-chip/outside RAM, such as DDR; if it is a RAM, it is necessary to ensure that information is not lost during the reset process.
  • an on-chip/outside ROM such as Flash
  • an on-chip/outside RAM such as DDR
  • the completion event and the setting completion flag are generated, and the storage is completed.
  • the chip reset module After the chip reset module detects the storage completion indication, resetting the chip
  • the third step the fault analysis module derives the bus monitoring information from the memory, analyzes the cause of the abnormality, and obtains the analysis result, and performs targeted protection and repair according to the analysis result.
  • FIG. 7 is a diagram of a bus topology structure of an embodiment of the present disclosure.
  • Each master (master device) is connected to the bus node, the bus nodes are interconnected, and finally connected to the slave device, such as bus node 4, the connected master is master7, and the connected upper node is bus node 0 and bus node. 1.
  • the next node to be connected is the bus node 5 and the bus node 6.
  • the chip bus subsystem uniformly numbers the master and the monitoring node.
  • the bus node When the system is running, the bus node is monitored, the bus node information is stored, and after the bus is abnormal, the bus node information is exported, the node information is analyzed, the access number is connected, and the access model is established.
  • This embodiment provides an embodiment of acquiring an information location bus exception using an event method.
  • FIG. 8 is a flowchart of an abnormality of positioning a bus by using an event method according to an embodiment of the present disclosure, and the processing steps of the flow part are as follows:
  • Step 1 Initialize the bus monitoring module and configure the monitoring information of each bus node.
  • Monitoring mode configured as continuous monitoring mode
  • Step 2 Initialize the information storage module
  • Node range is all bus nodes
  • Step 3 Capture access information. After completing steps 1) and 2), the bus monitoring module begins to capture information, and each bus node saves the information to the capture register.
  • Step 4 Store monitoring information
  • the bus monitoring module When the bus is abnormal, the bus monitoring module generates an event and sends the event to the information storage module;
  • the information storage module After receiving the event, the information storage module reads the monitoring information from the capture registers of each bus node.
  • the monitoring information of each bus node is stored in a specified space, and the information storage module completes the storage post-complete flag and generates a completion event;
  • Step 5 Reset the chip
  • chip reset module query completion flag is set or the completion event is received, the chip reset is performed
  • the processor After the chip is reset, the processor reads the monitoring information from the ROM and saves the information as a file.
  • Step 6 The fault analysis module exports and analyzes the monitoring information to locate the bus abnormality.
  • the master who accesses the illegal address is found.
  • Step 7 Address protect the master of the illegal access address.
  • the master Before the master initiates an access, it is detected whether the access address is legal, and the access is illegally prohibited for the address.
  • This embodiment provides an embodiment for obtaining an information location bus exception by using a query method.
  • FIG. 9 is a flow chart of using the query mode to locate a bus abnormality according to an embodiment of the present disclosure, and the processing steps of the flow part are as follows:
  • Step 2 initializing the information storage module
  • Operation mode is configured as query mode
  • Step 4 Store monitoring information
  • the information storage module periodically queries the status register of the bus monitoring module, and if the status flag of the bus abnormality is queried, the bus is abnormal;
  • the information storage module reads the monitoring information from the capture registers of the respective bus nodes
  • the monitoring information of each bus node is stored in a specified space, and the information storage module completes the storage post-complete flag and generates a completion event;
  • Step 5 step 6, and step 7 are the same as the first one.
  • the present embodiment provides an embodiment in which the bus storage information is used to locate the bus abnormality.
  • the RAM usage rate is higher compared to the embodiment using ROM.
  • FIG. 10 is a flow chart of using the RAM to store information to locate a bus abnormality according to an embodiment of the present disclosure, and the processing steps of the flow part are as follows:
  • Step 1 is the same as the first embodiment.
  • Step 2 Initialize the information storage module
  • Operation mode configured as event trigger
  • Node range is all bus nodes
  • Step 4 Store monitoring information
  • the bus monitoring module When the bus is abnormal, the bus monitoring module generates an event and sends the event to the information storage module;
  • the information storage module After receiving the event, the information storage module reads the monitoring information from the capture registers of each bus node.
  • the monitoring information of each bus node is stored in the RAM by utilizing the feature that the RAM space content is not lost, and the information storage module completes the storage post-complete flag and generates the completion event.
  • Step 5 Reset the chip
  • chip reset module query completion flag is set or the completion event is received, the chip reset is performed
  • the RAM is the on-chip space, you need to ensure that the chip reset type is a power-down reset. If the RAM is an off-chip space, it is guaranteed that the chip in which the memory is located is a power-down reset.
  • Step 6 step 7, and implement the same one.
  • the present embodiment provides that the information storage module stores the bus monitoring information all the time after startup until it stops receiving the bus abnormal event.
  • FIG. 11 is a flow chart of locating a bus abnormality in a continuous storage manner according to an embodiment of the present disclosure. This embodiment can locate a bus abnormality in some more complicated scenarios.
  • Step 1 Initialize the bus monitoring module and configure the monitoring information of each bus node.
  • Monitoring mode configured as continuous monitoring mode
  • Step 2 Initialize the information storage module
  • Operation mode configured for continuous storage
  • the number of consecutive storage can be configured
  • Node range is all bus nodes
  • Step 4 Monitor the information store
  • the information storage module checks that the bus monitor has information to be written, the monitoring information is written into the corresponding RAM space;
  • the bus monitoring module When the bus abnormality is triggered, the bus monitoring module generates an event number to notify the information storage module;
  • the information storage module After the information storage module saves the last piece of information, it stops storing;
  • the information storage module completes the storage post-complete flag and generates a completion event.
  • Step 5 Reset the chip
  • the chip reset module determines the status or receives an event, the power-off chip reset is performed
  • Step 6 The fault analysis module exports and analyzes the monitoring information to locate the cause of the bus abnormality.
  • the monitoring information is read in sequence until the last message.
  • Step 7 Address protect the master of the illegal access address.
  • the master Before the master initiates an access, it is detected whether the access address is legal, and the access is illegally prohibited for the address.
  • the information storage module, the second bus and the memory are added on the basis of the bus monitoring module to ensure that the bus monitoring information can be obtained in the case of a bus abnormality.
  • the fault analysis module quickly locates the bus abnormality, and protects or corrects the error according to the analysis result, prevents the bus abnormality from being triggered again, and improves the stability of the system.
  • Embodiments of the present disclosure also provide a storage medium.
  • the foregoing storage medium may be configured to store program code for performing the following steps:
  • the foregoing storage medium may include, but not limited to, a USB flash drive, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, and a magnetic memory.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • a mobile hard disk e.g., a hard disk
  • magnetic memory e.g., a hard disk
  • the processor performs monitoring on the first bus where the bus monitoring module is located according to the stored program code in the storage medium, and generates monitoring information.
  • the processor executes the acquiring the monitoring information from the bus monitoring module by using the second bus according to the stored program code in the storage medium.
  • modules or steps of the present disclosure described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein.
  • the steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module. As such, the disclosure is not limited to any specific combination of hardware and software.
  • the present disclosure is applicable to the field of processors to ensure that bus monitoring information can be acquired even in the case of bus abnormalities, and solves the technical problem that the bus monitoring information cannot be acquired when the bus is abnormal in the related art, and improves the monitoring efficiency of the bus node.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Debugging And Monitoring (AREA)
  • Multi Processors (AREA)
  • Bus Control (AREA)

Abstract

本公开提供了一种总线监控系统、方法及装置,其中,该系统包括:总线节点;总线监控模块,设置为对总线监控模块所在的第一总线进行监控,生成监控信息;信息存储模块,设置为通过第二总线从总线监控模块中获取监控信息;第一总线,设置为连接总线节点的主设备和辅设备;第二总线,设置为连接信息存储模块和总线监控模块;其中,第二总线独立于第一总线。通过本公开,解决了相关技术中在总线异常时不能获取总线监控信息的技术问题。

Description

总线监控系统、方法及装置 技术领域
本公开涉及处理器领域,具体而言,涉及一种总线监控系统、方法及装置。
背景技术
随着移动通信系统产品性能的提高,其使用的芯片中集成处理器数量和外设数量越来越大,总线越来越复杂。为了便于故障定位,相关技术中的芯片,设计有总线监控模块,负责监控各总线节点的访问信息,如图1所示,图1是本公开相关技术中的总线监控装置图。这种方式下,如果出现总线异常,处理器或外设不能通过总线访问,不能获取到总线监控信息,很难定位故障。
针对相关技术中存在的上述问题,目前尚未发现有效的解决方案。
发明内容
本公开实施例提供了一种总线监控系统、方法及装置,以至少解决相关技术中在总线异常时不能获取总线监控信息的技术问题。
根据本公开的一个实施例,提供了一种总线监控系统,包括:总线节点;总线监控模块,用于对总线监控模块所在的第一总线进行监控,生成监控信息;信息存储模块,用于通过第二总线从所述总线监控模块中获取所述监控信息;所述第一总线,用于连接所述总线节点的主设备和辅设备。所述第二总线,用于连接所述信息存储模块和所述总线监控模块;其中,所述第二总线独立于所述第一总线。
可选地,所述第二总线还用于以下至少之一:对所述总线节点所在的芯片的数据进行备份,对通路进行扩流量,扩展用作指定用途。
可选地,所述系统还包括:芯片复位模块,用于在检测到所述监控信息保存完成的指示时,对所述总线节点所在的芯片执行复位操作;故障分析模块,用于根据所述监控信息进行故障分析。
可选地,所述系统还包括:存储模块,用于保存所述监控信息。
可选地,所述第二总线,还用于连接所述信息存储模块和所述存储模块。
根据本公开的另一个实施例,提供了一种总线监控方法,包括:对总线监控模块所在的第一总线进行监控,生成监控信息;通过第二总线从所述总线监控模块中获取所述监控信息;其中,所述第二总线独立于所述第一总线。
可选地,通过第二总线从总线监控模块中获取监控信息包括以下之一:按照事件触发通过第二总线从总线监控模块中获取监控信息;按照查询状态通过第二总线从总线监控模块中获取监控信息;按照连续存储通过第二总线从总线监控模块中获取监控信息。
可选地,在通过第二总线从所述总线监控模块中获取所述监控信息之后,所述方法还包 括:保存所述监控信息。
可选地,在保存所述监控信息之后,所述方法还包括:对所述总线节点所在的芯片执行复位操作;在所述复位操作完成之后,根据所述监控信息进行故障分析。
可选地,对所述总线节点所在的芯片执行复位操作包括:
在检测到所述监控信息保存完成的指示时,对所述总线节点所在的芯片执行复位操作。
可选地,根据所述监控信息进行故障分析包括:通过所述监控信息推断所述芯片的访问模型;根据所述访问模型定位故障类型。
可选地,根据所述监控信息进行故障分析包括:遍历所有主设备读取与当前主设备直连的总线节点编号;根据所述总线节点编号读取对应总线节点的历史访问信息;分析所述历史访问信息检测所述当前主设备是否发起了访问;在发起了访问时,判断访问的地址是否在合法的范围内;在访问的地址未在合法的范围内时,记录以下节点信息至少之一:主设备编号,访问地址,访问属性;根据所述节点信息定位非法访问的主设备。
可选地,在根据所述监控信息进行故障分析之后,所述方法还包括:确定故障总线节点;在当前总线节点发起访问时,检测所述故障总线节点的访问地址是否合法;在所述故障总线节点的访问地址不合法时,禁止访问所述故障总线节点。
可选地,对总线监控模块所在的第一总线进行监控,生成监控信息,包括:通过第一总线监控总线节点的访问信息,将所述访问信息保存至总线监控模块得到所述总线节点的监控信息。
可选地,通过第二总线从所述总线监控模块中获取所述监控信息包括:在当前访问信息表征所述总线节点异常时,通过第二总线从所述总线监控模块中获取所述监控信息;在当前访问信息表征所述总线节点正常时,通过第二总线从所述总线监控模块中获取所述监控信息。
根据本公开的又一个实施例,提供了一种总线监控装置,包括:监控模块,用于对总线监控模块所在的第一总线进行监控,生成监控信息;获取模块,用于通过第二总线从所述总线监控模块中获取所述监控信息;其中,所述第二总线独立于所述第一总线。
可选地,所述获取模块包括以下之一:第一获取单元,用于按照事件触发通过第二总线从总线监控模块中获取监控信息;第二获取单元,用于按照查询状态通过第二总线从总线监控模块中获取监控信息;第三获取单元,用于按照连续存储通过第二总线从总线监控模块中获取监控信息。
可选地,所述装置还包括:复位模块,用于在所述存储模块保存所述监控信息之后,对所述总线节点所在的芯片执行复位操作;分析模块,用于根据所述监控信息进行故障分析。
可选地,所述装置还包括:确定模块,用于在所述分析模块根据所述监控信息进行故障分析之后,确定故障总线节点;检测模块,用于在当前总线节点发起访问时,检测所述故障总线节点的访问地址是否合法;处理模块,用于在所述故障总线节点的访问地址不合法时,禁止访问所述故障总线节点。
可选地,所述装置还包括:存储模块,用于保存所述监控信息。
根据本公开的又一个实施例,还提供了一种存储介质。该存储介质设置为存储用于执行以下步骤的程序代码:
通对总线监控模块所在的第一总线进行监控,生成监控信息;
通过第二总线从所述总线监控模块中获取所述监控信息。
通过本公开,通过在第一总线的基础上再增加第二总线,并使用第二总线获取监控信息,保证在总线异常的情况下也能获取总线监控信息,解决了相关技术中在总线异常时不能获取总线监控信息的技术问题,提高了总线节点的监控效率。
附图说明
此处所说明的附图用来提供对本公开的进一步理解,构成本申请的一部分,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。在附图中:
图1是本公开相关技术中的总线监控装置图;
图2是根据本公开实施例的总线监控方法的流程图;
图3是根据本公开实施例的总线监控装置的结构框图;
图4是根据本公开实施例的总线监控系统的结构框图;
图5是本公开实施例的总线监控装置框图;
图6是本公开实施例的信息存储模块结构图;
图7是本公开实施例的总线拓扑结构图;
图8是本公开实施例采用事件方式定位总线异常流程图;
图9是本公开实施例采用查询方式定位总线异常流程图;
图10是本公开实施例采用RAM存储信息定位总线异常流程图;
图11是本公开实施例采用连续存储方式定位总线异常流程图。
具体实施方式
下文中将参考附图并结合实施例来详细说明本公开。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
实施例1
在本实施例中提供了一种总线监控方法,图2是根据本公开实施例的总线监控方法的流程图,如图2所示,该流程包括如下步骤:
步骤S202,对总线监控模块所在的第一总线进行监控,生成监控信息;
步骤S204,通过第二总线从总线监控模块中获取监控信息。
通过上述步骤,通过在第一总线的基础上再增加第二总线,并使用第二总线获取监控信息,保证在总线异常的情况下也能获取总线监控信息,解决了相关技术中在总线异常时不能获取总线监控信息的技术问题,提高了总线节点的监控效率。
可选地,上述步骤的执行主体可以为总线节点,具体为:处理器,芯片,总线管理设备等,但不限于此。
可选的,在步骤S204之后,本实施例的方案还包括:步骤S206,保存监控信息。
可选地,通过第二总线从总线监控模块中获取监控信息包括以下之一:
按照事件触发通过第二总线从总线监控模块中获取监控信息;
按照查询状态通过第二总线从总线监控模块中获取监控信息;
按照连续存储通过第二总线从总线监控模块中获取监控信息。
可选地,保存监控信息可以但不限于为:在本地ROM空间保存监控信息;在本地RAM空间保存监控信息;在异地ROM空间保存监控信息;在异地RAM空间保存监控信息。具体的,ROM可以是Flash(闪存),RAM可以是DDR(Double Data Rate SDRAM)。
可选地,在保存监控信息之后,方法还包括:
S11,对总线节点所在的芯片执行复位操作;
S12,在所述复位操作完成之后,根据监控信息进行故障分析。
在本实施例中,对总线节点所在的芯片执行复位操作包括:在检测到监控信息保存完成的指示时,对总线节点所在的芯片执行复位操作。
可选的,根据监控信息进行故障分析包括:
S21,通过监控信息推断芯片的访问模型;访问模型是根据芯片中各个总线节点的连接访问关系建立的;
S22,根据访问模型定位故障类型。
可选的,根据监控信息进行故障分析包括:
S31,遍历所有主设备(master)读取与当前主设备直连的总线节点编号;
S32,根据总线节点编号读取对应总线节点的历史访问信息;
S33,分析所述历史访问信息检测所述当前主设备是否发起了访问;
S34,在发起了访问时,判断访问的地址是否在合法的范围内;
S35,在访问的地址未在合法的范围内时,记录以下节点信息至少之一:总线节点编号,访问地址,访问属性;
S36,根据节点信息定位非法访问的主设备。
可选的,在根据监控信息进行故障分析之后,方法还包括:
S41,确定故障总线节点;
S42,在当前总线节点发起访问时,检测故障总线节点的访问地址是否合法;
S43,在故障总线节点的访问地址不合法时,禁止访问故障总线节点。
可选的,通过第二总线从总线监控模块中获取监控信息包括:在当前访问信息表征总线节点异常时,通过第二总线从总线监控模块中获取监控信息;在当前访问信息表征总线节点正常时,通过第二总线从总线监控模块中获取监控信息。
可选的,对总线监控模块所在的第一总线进行监控,生成监控信息,包括:通过第一总线监控总线节点的访问信息,将所述访问信息保存至总线监控模块得到所述总线节点的监控 信息。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本公开各个实施例所述的方法。
实施例2
在本实施例中还提供了一种总线监控装置、系统,用于实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图3是根据本公开实施例的总线监控装置的结构框图,该装置可以应用在芯片等,尤其是多处理器芯片,如图3所示,该装置包括:
监控模块30,用于对总线监控模块所在的第一总线进行监控,生成监控信息;
获取模块32,用于通过第二总线从总线监控模块中获取监控信息。
可选的,所述装置还包括:存储模块,用于保存监控信息。
可选的,获取模块包括以下至少之一:第一获取单元,用于按照事件触发通过第二总线从总线监控模块中获取监控信息;第二获取单元,用于按照查询状态通过第二总线从总线监控模块中获取监控信息;第三获取单元,用于按照连续存储通过第二总线从总线监控模块中获取监控信息。
可选的,装置还包括:复位模块,用于在存储模块保存监控信息之后,对总线节点所在的芯片执行复位操作;分析模块,用于根据监控信息进行故障分析。
可选的,装置还包括:确定模块,用于在分析模块根据监控信息进行故障分析之后,确定故障总线节点;检测模块,用于在当前总线节点发起访问时,检测故障总线节点的访问地址是否合法;处理模块,用于在故障总线节点的访问地址不合法时,禁止访问故障总线节点。
图4是根据本公开实施例的总线监控系统的结构框图,如图4所示,包括:
总线节点40;
总线监控模块42,用于对总线监控模块所在的第一总线进行监控,生成监控信息;
信息存储模块44,用于通过第二总线从总线监控模块中获取监控信息;
第一总线48,用于连接总线节点的主设备和辅设备。
第二总线50,用于连接信息存储模块和总线监控模块,以及连接信息存储模块和存储模块;
其中,第二总线独立于第一总线。
可选的,系统还包括:存储模块46,用于保存监控信息;可选的,第二总线,还用于连接信息存储模块和存储模块。
可选的,第二总线还用于以下至少之一:对总线节点所在的芯片的数据进行备份,对通路进行扩流量,扩展用作指定用途。
可选的,系统还包括:芯片复位模块,用于在检测到监控信息保存完成的指示时,对总线节点所在的芯片执行复位操作;故障分析模块,用于根据监控信息进行故障分析。
需要说明的是,上述各个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述各个模块以任意组合的形式分别位于不同的处理器中。
实施例3
本实施例是根据本公开的可选实施例,用于结合具体的实例和场景对本申请进行补充和详细说明:
本实施例提供了一种在多处理器芯片中,在总线监控模块的基础上,增加第二总线和信息存储模块完成总线节点信息的存储,保证在总线异常发生时,能够获取总线监控信息并进行异常分析。本实施例的方案保证了在总线异常情况下能获取总线节点的监控信息,快速、准确地定位故障。
本公开所述的一种多处理器芯片总线监控装置,包括以下模块:总线监控模块、信息存储模块、存储器、芯片复位模块和故障分析模块,图5是本公开实施例的总线监控装置框图。
本公开装置的信息存储模块通过第二总线(图4所示)从总线监控模块(监控第一总线)读取监控信息并将信息存储到存储器中;存储完成后通知芯片复位模块复位芯片;故障分析模块从存储器中获取监控信息进行异常分析。
所述的第一总线(图4中的总线0~总线N都属于第一总线)是连接master设备(主设备)与slave设备(辅设备)的通路。处理器、外设等master设备通过第一总线访问DDR、总线监控单元等slave设备。
所述的第二总线是用于信息存储模块访问总线监控模块和存储器,通过第二总线将总线监控模块的监控信息存储到存储器中。第二总线独立于第一总线,即第一总线发生异常时,第二总线也能访问总线监控模块;
所述第二总线还可以用作芯片的备份、扩流量。
所述的总线节点是多条总线(总线0~总线N)出入口,总线监控模块可以监控一个或多个总线节点。
所述的总线监控模块负责捕获总线节点的总线(总线0~总线N)访问信息;所述的总线监控模块包括配置部分,监控部分和事件产生部分。配置部分接收配置信息,监控部分按照配置进行总线监控,将捕获的监控信息写到总线监控模块的寄存器空间;事件产生部分在异常时产生事件并设置异常状态;
所述的信息存储模块负责总线监控信息存储和存储器空间管理;所述的信息存储模块包括配置部分、读写操作部分及事件处理部分。图6是本公开实施例的信息存储模块结构图,见图6:信息存储模块的结构图。
所述的信息存储模块使用第二总线,该总线独立于总线监控模块所在的第一总线,保证 第一总线异常情况下,信息存储模块仍然可以执行操作;
所述的配置部分主要配置操作方式,包括查询、事件触发和连续存储。所述的查询方式是指查询总线监控模块状态;所述的事件触发是指响应总线监控模块异常事件;所述的连续触发模式是指信息存储模块连续存储总线监控模块捕获的访问信息。
所述的读写操作部分是将总线监控模块的监控信息写到存储器,操作方式由配置控制。所述的读写操作部分按照总线节点管理存储器存储空间,保证各个总线节点信息的正确性。
所述的事件处理部分主要完成接收总线监控模块送过来的异常事件以及在存储操作完成后产生读写完成事件和置完成标志。
所述的信息存储模块可以完成一个或多个总线监控模块的信息存储;
所述的存储器可以是监控总线芯片内ROM/RAM空间,也可以是芯片外RAM/ROM空间,用于存放监控信息。
所述的芯片复位模块执行芯片复位操作。主要负责查询信息存储模块的执行状态或接收信息存储模块传过来的完成事件,并根据状态或事件执行芯片复位操作;
所述的故障分析模块负责问题定位分析。根据存储器中总线监控节点的访问信息,推断出总线异常时的芯片访问模型,找出异常访问。
本实施例的多处理器芯片总线监控的方法,包括以下步骤:
第一步:总线监控模块捕获第一总线访问信息。
第二步:信息存储模块按照不同的操作方式通过第二总线从总线监控模块获取监控信息,将监控信息存储到存储器中。
所述操作方式包括事件触发、查询状态、连续存储;
所述的存储器可以是芯片内/外的ROM,如Flash,也可以是芯片内/外的RAM,如DDR;如果是RAM,需要保证在复位过程中信息不丢失。
所述的信息存储模块存储完成后产生完成事件和设置完成标志,指示存储完毕。
所述的芯片复位模块检测到存储完毕指示后,复位芯片;
第三步:故障分析模块从存储器中导出总线监控信息、分析异常原因并得出分析结果,根据分析结果进行针对性的防护和修复。
图7是本公开实施例的总线拓扑结构图。
每个master(主设备)连接到总线节点上,总线节点之间互连,最终连接到slave设备,如总线节点4,连接的master是master7,连接的上一级节点是总线节点0和总线节点1,连接的下一级节点是总线节点5和总线节点6。
芯片总线子系统对master和监控节点进行统一编号,系统运行时,监测总线节点,存储总线节点信息,总线异常后,导出监测总线节点信息,对节点信息进行分析,连接访问编号,建立访问模型。
本实施例还包括以下实例:
实施一:
本实施提供了一个采用事件方式获取信息定位总线异常的实施例。
图8是本公开实施例采用事件方式定位总线异常流程图,流程部分的处理步骤如下:
步骤1:初始化总线监控模块,配置各个总线节点的监测信息
配置master和总线节点的编号,对master和节点分别统一编号。
配置监测的地址范围为整个芯片的地址空间
配置访问属性为读写
监测模式配置为连续监测模式
使能总线监控。
步骤2:初始化信息存储模块
操作方式配置为事件触发
节点范围为所有总线节点
信息存储在芯片内ROM空间
步骤3:捕获访问信息。完成步骤1)、2)后,总线监控模块就开始捕获信息,各个总线节点将信息保存到捕获寄存器中。
步骤4:存储监控信息
当总线异常时,总线监控模块产生事件发给信息存储模块;
信息存储模块收到事件后,从各个总线节点的捕获寄存器中读取监控信息
将各个总线节点的监控信息存储到指定的空间中,信息存储模块完成存储后置完成标志和产生完成事件;
步骤5:复位芯片
芯片复位模块查询完成标志置位或收到完成事件后,执行芯片复位;
芯片复位后,处理器从ROM中读取监控信息,并将这些信息保存成文件。
步骤6:故障分析模块导出并分析监控信息,定位总线异常原因
遍历所有master,读取与当前master直连的总线节点编号;
根据总线节点编号,读取对应总线节点访问信息;
分析访问信息,根据master编号,检测当前master是否对总线节点发起了访问;
若发起了访问,判断访问的地址是否在合法的范围内;
若地址非法访问,记录以下信息至少之一:master编号、访问地址、访问属性;
根据记录的信息从而找出访问非法地址的master。
步骤7:对于非法访问地址的master进行地址保护。
在master发起访问前,检测访问地址是否合法,对于地址非法禁止访问。
实施二:
本实施提供了一个采用查询方式获取信息定位总线异常的实施例。
图9是本公开实施例采用查询方式定位总线异常流程图,流程部分的处理步骤如下:
步骤1、同实施一。
步骤2、初始化信息存储模块
1)操作方式配置为查询方式
2)节点范围为所有总线节点
3)信息存储在芯片外ROM空间
步骤3:同实施一。
步骤4:存储监控信息
信息存储模块周期性地查询总线监控模块的状态寄存器,如果查询到总线异常的状态标志,则总线异常;
信息存储模块从各个总线节点的捕获寄存器中读取监控信息;
将各个总线节点的监控信息存储到指定的空间中,信息存储模块完成存储后置完成标志和产生完成事件;
步骤5、步骤6,步骤7同实施一。
实施三:
由于多处理芯片ROM访问效率低,本实施提供了一个采用RAM存储信息定位总线异常的实施例。与使用ROM的实施例比较,一般来讲使用RAM速率更高。
图10是本公开实施例采用RAM存储信息定位总线异常流程图,流程部分的处理步骤如下:
步骤1同实施例一。
步骤2:初始化信息存储模块
操作方式,配置为事件触发
节点范围为所有总线节点
信息存储在芯片外RAM空间
步骤3:同实施例一。
步骤4:存储监控信息
当总线异常时,总线监控模块产生事件发给信息存储模块;
信息存储模块收到事件后,从各个总线节点的捕获寄存器中读取监控信息
在不掉电复位情况下,利用RAM空间内容不丢失的特性,将各个总线节点的监控信息存储到RAM中,信息存储模块完成存储后置完成标志和产生完成事件。
步骤5:复位芯片
芯片复位模块查询完成标志置位或收到完成事件后,执行芯片复位;
如果RAM是芯片片内空间,需要保证芯片复位类型是不掉电复位。如果RAM是芯片外空间,则保证该存储器所在的芯片为不掉电复位。
步骤6,步骤7,同实施一。
实施四:
本实施提供了信息存储模块在启动后一直存储总线监控信息,直到接收到总线异常事件时停止。
这种实施例与前面几种差异,本实施例能够获得更多的总线监控信息,实施例一、二和三只得到总线异常时最后一次的信息。图11是本公开实施例采用连续存储方式定位总线异 常流程图,本实施例可以定位某些更复杂场景下的总线异常。
流程部分的处理步骤如下:
步骤1:初始化总线监控模块,配置各个总线节点的监测信息
配置master和总线节点的编号,对master和节点分别统一编号。
配置监测的地址范围为整个芯片的地址空间
配置访问属性为读写
监测模式配置为连续监测模式
使能总线监控。
步骤2:初始化信息存储模块
操作方式,配置为连续存储,连续存储的个数可配置
节点范围为所有总线节点
存储在芯片内RAM空间,设置空间的起始地址和大小
步骤3:同实施一。
步骤4:监控信息存储
信息存储模块检查到总线监控有信息写入时,将该监控信息写入到相应的RAM空间;
根据RAM空间的起始地址和大小持续覆盖保存
当触发总线异常时,总线监控模块产生事件号,通知信息存储模块;
信息存储模块保存完最后一条信息后,停止存储;
信息存储模块完成存储后置完成标志和产生完成事件。
步骤5:复位芯片
芯片复位模块判断状态或收到事件后,执行不掉电芯片复位;
步骤6:故障分析模块导出并分析监控信息,定位出总线异常原因
依次读取监控信息,直到最后一条信息。
遍历所有master,读取与当前master直连的总线节点编号;
根据总线节点编号,读取对应总线节点的历史访问信息;
分析历史访问信息,根据master编号,检测当前master是否发起了访问;
若发起了访问,判断访问的地址是否在合法的范围内;
若地址非法访问,记录以下信息至少之一:master编号、访问地址及访问属性;
根据记录的信息找出访问非法地址的master。
步骤7:对于非法访问地址的master进行地址保护。
在master发起访问前,检测访问地址是否合法,对于地址非法禁止访问。
采用本实施例的方案,在总线监控模块的基础上增加信息存储模块、第二总线和存储器,保证在总线异常的情况下能获取总线监控信息。
根据获取总线监控信息,故障分析模块快速定位总线异常原因,根据分析结果进行防护或纠正错误,防止再次触发总线异常,提高系统的稳定性。
实施例4
本公开的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的程序代码:
S1,对总线监控模块所在的第一总线进行监控,生成监控信息;
S2,通过第二总线从总线监控模块中获取监控信息。
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
可选地,在本实施例中,处理器根据存储介质中已存储的程序代码执行对总线监控模块所在的第一总线进行监控,生成监控信息;
可选地,在本实施例中,处理器根据存储介质中已存储的程序代码执行通过第二总线从总线监控模块中获取监控信息。
可选地,本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。
显然,本领域的技术人员应该明白,上述的本公开的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本公开不限制于任何特定的硬件和软件结合。
以上所述仅为本公开的优选实施例而已,并不用于限制本公开,对于本领域的技术人员来说,本公开可以有各种更改和变化。凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。
工业实用性
本公开适用于处理器领域,用以保证在总线异常的情况下也能获取总线监控信息,解决了相关技术中在总线异常时不能获取总线监控信息的技术问题,提高了总线节点的监控效率。

Claims (22)

  1. 一种总线监控系统,包括:
    总线节点;
    总线监控模块,设置为对总线监控模块所在的第一总线进行监控,生成监控信息;
    信息存储模块,设置为通过第二总线从所述总线监控模块中获取所述监控信息;
    所述第一总线,设置为连接所述总线节点的主设备和辅设备;
    所述第二总线,设置为连接所述信息存储模块和所述总线监控模块;
    其中,所述第二总线独立于所述第一总线。
  2. 根据权利要求1所述的系统,其中,所述第二总线还设置为执行以下至少之一:对所述总线节点所在的芯片的数据进行备份,对通路进行扩流量,扩展用作指定用途。
  3. 根据权利要求1所述的系统,其中,所述系统还包括:
    芯片复位模块,设置为在检测到所述监控信息保存完成的指示时,对所述总线节点所在的芯片执行复位操作;
    故障分析模块,设置为根据所述监控信息进行故障分析。
  4. 根据权利要求1所述的系统,其中,所述系统还包括:
    存储模块,设置为保存所述监控信息。
  5. 根据权利要求4所述的系统,其中,所述第二总线,还设置为连接所述信息存储模块和所述存储模块。
  6. 一种总线监控方法,包括:
    对总线监控模块所在的第一总线进行监控,生成监控信息;
    通过第二总线从所述总线监控模块中获取所述监控信息;
    其中,所述第二总线独立于所述第一总线。
  7. 根据权利要求6所述的方法,其中,通过第二总线从总线监控模块中获取监控信息包括以下之一:
    按照事件触发通过第二总线从总线监控模块中获取监控信息;
    按照查询状态通过第二总线从总线监控模块中获取监控信息;
    按照连续存储通过第二总线从总线监控模块中获取监控信息。
  8. 根据权利要求6所述的方法,其中,在通过第二总线从所述总线监控模块中获取所述监控信息之后,所述方法还包括:
    保存所述监控信息。
  9. 根据权利要求6所述的方法,其中,在保存所述监控信息之后,所述方法还包括:
    对总线节点所在的芯片执行复位操作;
    在所述复位操作完成之后,根据所述监控信息进行故障分析。
  10. 根据权利要求9所述的方法,其中,对总线节点所在的芯片执行复位操作包括:
    在检测到所述监控信息保存完成的指示时,对总线节点所在的芯片执行复位操作。
  11. 根据权利要求9所述的方法,其中,根据所述监控信息进行故障分析包括:
    通过所述监控信息推断所述芯片的访问模型;
    根据所述访问模型定位故障类型。
  12. 根据权利要求9所述的方法,其中,根据所述监控信息进行故障分析包括:
    遍历所有主设备读取与当前主设备直连的总线节点编号;
    根据所述总线节点编号读取对应总线节点的历史访问信息;
    分析所述历史访问信息检测所述当前主设备是否发起了访问;
    在发起了访问时,判断访问的地址是否在合法的范围内;
    在访问的地址未在合法的范围内时,记录以下节点信息至少之一:主设备编号,访问地址,访问属性;
    根据所述节点信息定位非法访问的主设备。
  13. 根据权利要求9所述的方法,其中,在根据所述监控信息进行故障分析之后,所述方法还包括:
    确定故障总线节点;
    在当前总线节点发起访问时,检测所述故障总线节点的访问地址是否合法;
    在所述故障总线节点的访问地址不合法时,禁止访问所述故障总线节点。
  14. 根据权利要求6所述的方法,其中,对总线监控模块所在的第一总线进行监控,生成监控信息,包括:
    通过第一总线监控总线节点的访问信息,将所述访问信息保存至总线监控模块得到所述总线节点的监控信息。
  15. 根据权利要求6所述的方法,其中,通过第二总线从所述总线监控模块中获取所述监控信息包括:
    在当前访问信息表征所述总线节点异常时,通过第二总线从所述总线监控模块中获取所述监控信息;
    在当前访问信息表征总线节点正常时,通过第二总线从所述总线监控模块中获取所述监控信息。
  16. 一种总线监控装置,包括:
    监控模块,设置为对总线监控模块所在的第一总线进行监控,生成监控信息;
    获取模块,设置为通过第二总线从所述总线监控模块中获取所述监控信息;
    其中,所述第二总线独立于所述第一总线。
  17. 根据权利要求16所述的装置,其中,所述获取模块包括以下之一:
    第一获取单元,设置为按照事件触发通过第二总线从总线监控模块中获取监控信息;
    第二获取单元,设置为按照查询状态通过第二总线从总线监控模块中获取监控信息;
    第三获取单元,设置为按照连续存储通过第二总线从总线监控模块中获取监控信息。
  18. 根据权利要求16所述的装置,其中,所述装置还包括:存储模块,设置为保存所 述监控信息。
  19. 根据权利要求18所述的装置,其中,所述装置还包括:
    复位模块,设置为在所述存储模块保存所述监控信息之后,对总线节点所在的芯片执行复位操作;
    分析模块,设置为根据所述监控信息进行故障分析。
  20. 根据权利要求19所述的装置,其中,所述装置还包括:
    确定模块,设置为在所述分析模块根据所述监控信息进行故障分析之后,确定故障总线节点;
    检测模块,设置为在当前总线节点发起访问时,检测所述故障总线节点的访问地址是否合法;
    处理模块,设置为在所述故障总线节点的访问地址不合法时,禁止访问所述故障总线节点。
  21. 一种存储介质,其中,所述存储介质包括存储的程序,其中,所述程序运行时执行权利要求6至15中任一项所述的方法。
  22. 一种处理器,其中,所述处理器用于运行程序,其中,所述程序运行时执行权利要求6至15中任一项所述的方法。
PCT/CN2018/095429 2017-09-12 2018-07-12 总线监控系统、方法及装置 WO2019052275A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US16/627,720 US11093361B2 (en) 2017-09-12 2018-07-12 Bus monitoring system, method and apparatus
EP18856083.3A EP3683682B1 (en) 2017-09-12 2018-07-12 Bus monitoring system, method and apparatus
JP2019572374A JP2020525944A (ja) 2017-09-12 2018-07-12 バス監視システム、方法および装置

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710819085.XA CN109491856B (zh) 2017-09-12 2017-09-12 总线监控系统、方法及装置
CN201710819085.X 2017-09-12

Publications (1)

Publication Number Publication Date
WO2019052275A1 true WO2019052275A1 (zh) 2019-03-21

Family

ID=65688015

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/095429 WO2019052275A1 (zh) 2017-09-12 2018-07-12 总线监控系统、方法及装置

Country Status (5)

Country Link
US (1) US11093361B2 (zh)
EP (1) EP3683682B1 (zh)
JP (1) JP2020525944A (zh)
CN (1) CN109491856B (zh)
WO (1) WO2019052275A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113268367B (zh) * 2021-04-26 2023-07-14 北京控制工程研究所 一种1553b总线rt端子地址查找表在轨监测及维护方法
CN113778734A (zh) * 2021-09-02 2021-12-10 上海砹芯科技有限公司 芯片、芯片总线的检测系统、检测方法及存储介质
CN114531371A (zh) * 2022-02-23 2022-05-24 杭州中天微系统有限公司 总线监测网络、片上系统以及总线管理方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090276666A1 (en) * 2008-04-30 2009-11-05 Egenera, Inc. System, method, and adapter for creating fault-tolerant communication busses from standard components
CN103840956A (zh) * 2012-11-23 2014-06-04 于智为 一种物联网网关设备的备份方法
CN104914849A (zh) * 2015-05-12 2015-09-16 安徽江淮汽车股份有限公司 一种故障记录装置及方法
CN104951385A (zh) * 2015-07-09 2015-09-30 首都师范大学 动态可重构总线监听系统中的通道健康状态记录装置
CN105372151A (zh) * 2015-12-02 2016-03-02 中国电子科技集团公司第四十一研究所 一种在线测试监控和故障定位的装置及方法
CN106919462A (zh) * 2015-12-25 2017-07-04 华为技术有限公司 一种生成处理器故障记录的方法及装置

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02284249A (ja) 1989-04-26 1990-11-21 Toshiba Corp バストレーサ
JPH11203174A (ja) 1998-01-19 1999-07-30 Nec Corp 状態監視情報処理装置
US6425009B1 (en) * 1999-06-08 2002-07-23 Cisco Technology, Inc. Monitoring redundant control buses to provide a high availability local area network for a telecommunications device
JP2003177975A (ja) 2001-12-12 2003-06-27 Yokogawa Electric Corp バスデータアナライザ用ステルスモジュール
JP4412031B2 (ja) * 2004-03-31 2010-02-10 日本電気株式会社 ネットワーク監視システム及びその方法、プログラム
US7710892B2 (en) * 2006-09-08 2010-05-04 Dominic Coupal Smart match search method for captured data frames
CN101334760B (zh) * 2007-06-26 2010-04-07 展讯通信(上海)有限公司 监控总线非法操作的方法、装置及包含该装置的系统
US7899323B2 (en) * 2007-09-28 2011-03-01 Verizon Patent And Licensing Inc. Multi-interface protocol analysis system
EP2313819A2 (en) * 2008-07-14 2011-04-27 The Regents of the University of California Architecture to enable energy savings in networked computers
JP5287402B2 (ja) 2009-03-19 2013-09-11 富士通株式会社 ネットワーク監視制御装置
CN101667152A (zh) * 2009-09-23 2010-03-10 华为技术有限公司 计算机系统及计算机系统的总线监控方法
JP2011076295A (ja) 2009-09-30 2011-04-14 Hitachi Ltd 組込系コントローラ
CN101989242B (zh) * 2010-11-12 2013-06-12 深圳国微技术有限公司 一种提高soc系统安全的总线监视器及其实现方法
CN103810074B (zh) * 2012-11-14 2017-12-29 华为技术有限公司 一种片上系统芯片及相应的监控方法
CN104714909B (zh) * 2013-12-13 2019-01-25 锐迪科(重庆)微电子科技有限公司 处理总线挂死的装置、方法、总线结构及系统
CN105589821B (zh) * 2014-10-20 2019-03-12 深圳市中兴微电子技术有限公司 一种防止总线死锁的装置及方法
CN106844133A (zh) * 2015-12-04 2017-06-13 深圳市中兴微电子技术有限公司 一种片上系统soc的监控方法及装置
JP2017107441A (ja) * 2015-12-10 2017-06-15 キヤノン株式会社 情報処理装置、並びに、その制御装置および制御方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090276666A1 (en) * 2008-04-30 2009-11-05 Egenera, Inc. System, method, and adapter for creating fault-tolerant communication busses from standard components
CN103840956A (zh) * 2012-11-23 2014-06-04 于智为 一种物联网网关设备的备份方法
CN104914849A (zh) * 2015-05-12 2015-09-16 安徽江淮汽车股份有限公司 一种故障记录装置及方法
CN104951385A (zh) * 2015-07-09 2015-09-30 首都师范大学 动态可重构总线监听系统中的通道健康状态记录装置
CN105372151A (zh) * 2015-12-02 2016-03-02 中国电子科技集团公司第四十一研究所 一种在线测试监控和故障定位的装置及方法
CN106919462A (zh) * 2015-12-25 2017-07-04 华为技术有限公司 一种生成处理器故障记录的方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3683682A4 *

Also Published As

Publication number Publication date
CN109491856B (zh) 2022-08-02
JP2020525944A (ja) 2020-08-27
EP3683682A1 (en) 2020-07-22
EP3683682B1 (en) 2023-11-29
US20200142798A1 (en) 2020-05-07
CN109491856A (zh) 2019-03-19
EP3683682A4 (en) 2021-06-16
US11093361B2 (en) 2021-08-17

Similar Documents

Publication Publication Date Title
CN103999055B (zh) 访问命令/地址寄存器装置中存储的数据
CN110992992B (zh) 一种硬盘测试方法、设备以及存储介质
US10733077B2 (en) Techniques for monitoring errors and system performance using debug trace information
WO2019052275A1 (zh) 总线监控系统、方法及装置
CN104320308B (zh) 一种服务器异常检测的方法及装置
US11294749B2 (en) Techniques to collect crash data for a computing system
CN104781885A (zh) 用于对行敲击事件进行响应的方法、装置和系统
CN111800490B (zh) 获取网络行为数据的方法、装置及终端设备
CN105700999A (zh) 记录处理器操作的方法及系统
CN112818307A (zh) 用户操作处理方法、系统、设备及计算机可读存储介质
TW201500919A (zh) 基板管理控制器遠端調試系統及方法
US11251976B2 (en) Data security processing method and terminal thereof, and server
US8977916B2 (en) Using data watchpoints to detect unitialized memory reads
JP5504604B2 (ja) Ram診断装置
CN105868038B (zh) 内存错误处理方法及电子设备
CN104636271A (zh) 访问命令/地址寄存器装置中存储的数据
WO2016101177A1 (zh) 计算机设备内存的检测方法和计算机设备
US11593209B2 (en) Targeted repair of hardware components in a computing device
TWI733964B (zh) 記憶體整體測試之系統及其方法
WO2019169615A1 (zh) 访问指令sram的方法和电子设备
JP2015130023A (ja) 情報記録装置、情報処理装置、情報記録方法、及び情報記録プログラム
US11537468B1 (en) Recording memory errors for use after restarts
CN108415788B (zh) 用于对无响应处理电路作出响应的数据处理设备和方法
CN107329808B (zh) 一种信息处理方法及装置
CN118113350A (zh) 寄存器数据写入方法、装置、读写保护模块和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18856083

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019572374

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018856083

Country of ref document: EP

Effective date: 20200414