WO2023024248A1 - 一种总线异常处置方法、装置、电子设备及可读存储介质 - Google Patents

一种总线异常处置方法、装置、电子设备及可读存储介质 Download PDF

Info

Publication number
WO2023024248A1
WO2023024248A1 PCT/CN2021/127330 CN2021127330W WO2023024248A1 WO 2023024248 A1 WO2023024248 A1 WO 2023024248A1 CN 2021127330 W CN2021127330 W CN 2021127330W WO 2023024248 A1 WO2023024248 A1 WO 2023024248A1
Authority
WO
WIPO (PCT)
Prior art keywords
bus
data
abnormal
target
variable
Prior art date
Application number
PCT/CN2021/127330
Other languages
English (en)
French (fr)
Inventor
江博
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Priority to US18/271,658 priority Critical patent/US11995014B2/en
Publication of WO2023024248A1 publication Critical patent/WO2023024248A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/36Handling requests for interconnection or transfer for access to common bus or bus system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • G06F11/221Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested to test buses, lines or interfaces, e.g. stuck-at or open line faults
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2002Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where interconnections or communication control functionality are redundant
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2273Test methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0022Multibus

Definitions

  • the present application relates to the technical field of controllers, and in particular to a method for handling bus anomalies, a device for handling bus anomalies, electronic equipment, and a computer-readable storage medium.
  • dual-controller and four-controller architecture configuration schemes have been formed, such as dual-controllers for one frame and four-controllers for one frame.
  • Multiple controllers jointly process services, greatly improving service performance.
  • another controller can take over the business, thus greatly improving the data security and disaster recovery performance of the equipment.
  • one box with four controllers means that there are four controller boards in one box.
  • steps such as master controller election and data synchronization among the controllers.
  • the communication between the controllers is often abnormal. If the communication between the various controllers is not smooth, multiple master controllers may be elected, resulting in a "split brain" of the upper-layer software, or may cause data inconsistency.
  • the purpose of this application is to provide a method for handling bus anomalies, a bus anomaly handling device, electronic equipment, and a computer-readable storage medium, so as to avoid "split-brain" of upper-layer software and ensure data consistency.
  • the application provides a bus abnormality handling method, including:
  • the target bus includes a main bus and several candidate buses, and the said target data corresponding to the main bus is the first data;
  • the bus abnormal condition is a data bus flag abnormal condition or a data content abnormal condition
  • judging whether the first data satisfies the bus exception condition includes:
  • the target data corresponding to the candidate bus is the second data, and the target second data is at least one of the second data;
  • the bus variable is a link abnormal variable
  • judging whether the bus variable is in an abnormal state includes:
  • the bus variable is a check abnormal variable, and it is judged whether the bus variable is in an abnormal state, including:
  • the verification abnormal variable is updated
  • the verification abnormal variable is greater than the first threshold, it is determined that the verification abnormal variable is in an abnormal state.
  • the bus variable is an abnormal type variable
  • judging whether the bus variable is in an abnormal state includes:
  • the type abnormal variable is greater than the second threshold, it is determined that the type abnormal variable is in an abnormal state.
  • judging whether the first data satisfies the bus abnormal condition includes:
  • the present application also provides a bus abnormal handling device, including:
  • An acquisition module configured to acquire corresponding several target data from several target buses; wherein, the target bus includes a main bus and several candidate buses, and the target data corresponding to the main bus is the first data;
  • the judging module is used to judge whether the first data satisfies the bus abnormal condition;
  • the bus abnormal condition is a data bus flag abnormal condition or a data content abnormal condition;
  • the bus update module is configured to select a healthy target candidate bus as a new master bus if the bus exception condition is met, and update local bus data.
  • the present application also provides an electronic device, including a memory and one or more processors, where computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the one or more processors, the one or more processors execute The steps of the method for handling bus exceptions in the foregoing embodiments.
  • the present application also provides one or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, the one or more processors execute the above-mentioned embodiments.
  • the steps of the bus exception handling method are described in detail below.
  • the bus abnormality handling method obtains corresponding several target data respectively from several target buses; wherein, the target bus includes a main bus and several candidate buses, and the target data corresponding to the main bus is the first data; judging Whether the first data meets the bus exception condition; the bus exception condition is the data bus flag exception condition or the data content exception condition; if the bus exception condition is met, then select the healthy target candidate bus as the new master bus, and update the local bus data .
  • the target data obtained on the main bus is called the first data
  • the first data can be used to judge whether the main bus is abnormal, that is, whether the first data meets the bus abnormal condition, and then judge whether there is communication during the control period.
  • the abnormal condition of the bus may be an abnormal condition of a data bus flag, that is, detect whether the bus flag in the first data is abnormal; or it may be an abnormal condition of data content, that is, detect whether the data content of the first data is abnormal.
  • the first data obtained from the main bus is not the data sent by the original controller, and then there is an abnormality in the main bus, and the abnormality of the main bus will cause communication between the controllers to fail smooth.
  • the healthy target candidate bus is selected as the new main bus, complete the replacement of the main bus, and Update the local bus data to indicate that the old master bus was abnormal.
  • a healthy (i.e. normal) target candidate bus is selected as the new main bus, so the bus in the normal state can always be used for communication, which ensures smooth communication between the various controllers and avoids the upper layer software
  • the "split brain" ensures data consistency.
  • the present application also provides a bus abnormality handling device, electronic equipment, and a computer-readable storage medium, which also have the above beneficial effects.
  • Fig. 1 is a flow chart of a bus exception handling method provided in the present application according to one or more embodiments
  • Fig. 2 is a connection structure diagram of a controller provided by an embodiment of the present application according to one or more embodiments;
  • FIG. 3 is a sequence diagram of a master node election process provided by the present application according to one or more embodiments
  • Fig. 4 is a flow chart of node communication steps provided by an embodiment of the present application according to one or more embodiments;
  • FIG. 5 is a flow chart of a specific bus exception handling method provided by the present application according to one or more embodiments
  • FIG. 6 is a schematic structural diagram of a device for handling bus exceptions provided by an embodiment of the present application according to one or more embodiments;
  • Fig. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present application according to one or more embodiments.
  • FIG. 1 is a flow chart of a bus exception handling method provided by an embodiment of the present application. The method includes:
  • the target bus includes a main bus and several candidate buses, and the target data corresponding to the main bus is the first data.
  • the data content and data type of the target data can be a synchronous data broadcast command, whose data content is the data that needs to be synchronized; or it can be a request synchronization command, whose data content is the unique identity of the specified node information; or it can be the master grabbing information for competing for the identity of the master node.
  • the target data is generated by the controller, it is sent to all target buses.
  • the target buses are all in a normal state, other controllers can obtain target data from each target bus, discard other target data except the first data, and execute corresponding instructions according to the first data.
  • the types of the candidate bus and the main bus may be the same or different.
  • FIG. 2 is a connection structure diagram of a controller provided by an embodiment of the present application.
  • the CPLD Complex Programmable Logic Device, complex programmable logic device
  • controller 1, controller 2, controller 3 and controller 4 are interconnected to bus1 (that is, bus 1) and bus2 (bus 1) through the backplane That is, on the bus 2).
  • the switch can be a switch chip or a MOS tube (the abbreviation of MOSFET, MOSFET is the abbreviation of Metal-Oxide-Semiconductor Field-Effect Transistor, metal-oxide semiconductor field-effect transistor, referred to as metal oxide half-field effect transistor), this switch is also controlled by the CPLD.
  • MOSFET Metal-Oxide-Semiconductor Field-Effect Transistor, metal-oxide semiconductor field-effect transistor, referred to as metal oxide half-field effect transistor
  • the CPLD turns on the switch chip (or MOS tube) to prevent the current surge of the hot-swappable controller from interfering with other controllers communicating on the bus.
  • the controller board adopts an open-drain output mode.
  • Bus 1 and bus 2 can be prototyped on IIC (Inter-Integrated Circuit, integrated circuit bus), and two of each bus One of the data lines is a clock line, and the other is a data line (that is, one of SCL/SDA, SCL'/SDA').
  • IIC Inter-Integrated Circuit, integrated circuit bus
  • FIG. 3 is a sequence diagram of a master node election process provided by an embodiment of the present application.
  • the CPLD detects a level change on the clock line or data line of the bus, it considers that the bus is in a non-idle state. If the non-idle state lasts for more than 200ms, the CPLD thinks that there is no "master" on the current bus, that is, there is no master controller among the controllers. In this case, each CPLD starts a first timer (node code*10ms) according to its own node code (ie, node code).
  • a "grab master” command is sent on each bus. Then start the second timer (node code * 1ms), if the bus is not found to be in a non-idle state within the second timer, the master grab is successful, and a "master grab success" command is sent at the same time.
  • FIG. 4 is a flow chart of node communication steps provided by the embodiment of the present application, including the following steps:
  • Step 0 The master node (that is, the controller) issues the "upload broadcast command”. All nodes simultaneously update the 4 nodes (including themselves) synchronous data collected by the "receiving buffer” to their own “BMC read buffer” to ensure the data consistency of all nodes. At the same time, clear the "accept buffer”.
  • Step 1 The master node issues a "synchronous data broadcast command" to broadcast the key information "abcdef" of its own controller board (simple example). At the same time store the data into its own “accept buffer”. After the other three nodes listen to the "synchronous data broadcast command", they store the "abcdef" sent by the main node in their own “accept buffer”.
  • Step 2 The master node issues a "request synchronization command" for node2, and then enters the passive listening state.
  • Step 3 After node2 receives the "request synchronization command”, it sends out the "synchronization data broadcast command” and broadcasts the key information "abcdef" of its own controller board (simple example). At the same time, the data is stored in its own “accept buffer”. After the other three nodes listen to the "synchronous data broadcast command”, they store the "abcdef" sent by node2 into their own “accept buffer”.
  • Step 4 step 5, step 6, and step 7 are similar to steps 2 and 3, and the main node goes through node3 and node4. According to the above process, it can be seen that in a read-write cycle, the process of replying ACK (Acknowledge character) from the slave node (ie non-master controller) to the master node (ie master controller) is cancelled, and the sda (serial character) is cancelled. The switching of the driving right of the data line) improves the reliability and reduces the probability of bus hanging.
  • ACK Acknowledge character
  • Step S103 may be executed, otherwise step S104 may be executed.
  • the bus abnormal condition is an abnormal condition of a data bus flag or an abnormal condition of a data content.
  • the abnormal condition of the data bus flag refers to the condition indicating that the bus flag in the first data is abnormal.
  • the abnormal data content condition refers to a condition indicating that the data content of the first data is abnormal.
  • the process of judging whether the first data satisfies the bus abnormal condition may include:
  • Step 11 Judging whether the first data is consistent with the target second data.
  • Step 12 If not consistent, judge whether the bus variable is in an abnormal state.
  • Step 13 If it is in an abnormal state, it is determined that the data content abnormal condition is met.
  • the target data corresponding to the candidate bus is the second data
  • the target second data is at least one of the second data. That is, after obtaining the first data and the second data, determine several target second data from several second data, compare each target second data with the first data, and judge the first data and each target Whether the second data are consistent. Since the probability of abnormalities occurring on more than two buses at the same time is almost zero, it can be considered that it is impossible for the first data and the target second data to be abnormal data. Therefore, if the first data is inconsistent with at least one target second data, it can be considered that the first data may be abnormal. In this case, in order to accurately determine whether the main bus is abnormal, it can be determined whether the bus variable is in an abnormal state, and if it is in an abnormal state, it can be determined that the data content abnormality condition is met.
  • the bus variable refers to the variable describing the working condition of the main bus, and its specific type and quantity are not limited. For example, it can be a link abnormal variable, a verification abnormal variable, a type abnormal variable, etc. It can be understood that different types of bus variables can represent different types of abnormalities on the main bus, and the more the number of bus variables, the more angles can be used to detect the main bus. This embodiment does not limit the specific detection method for judging whether the bus command is in an abnormal state.
  • the bus variable is a link abnormal variable, and the process of judging whether the bus variable is in an abnormal state includes:
  • Step 21 Monitor the main bus for a preset duration and obtain the monitoring result.
  • Step 22 If the monitoring result is all zeros or all ones, set the link abnormal variable to an abnormal state, and determine that the link abnormal variable is in an abnormal state.
  • the link abnormal variable can be set to an abnormal state, and then it is determined that the detected link abnormal variable is in an abnormal state.
  • the bus variable is a check abnormal variable, and judging whether the bus variable is in an abnormal state includes:
  • Step 31 Perform statistics on the number of target bits on the first data to obtain a statistical result.
  • Step 32 If the statistical result does not match the verification data in the first data, update the verification abnormal variable.
  • Step 33 If the verification abnormal variable is greater than the first threshold, determine that the verification abnormal variable is in an abnormal state.
  • the target bit refers to a preset bit used as a detection standard, for example, 0 bit or 1 bit.
  • the main bus is affected by factors such as signal interference, the data transmitted on it may change, and the bits therein may change, for example, from 0 bit to 1 bit. Therefore, by counting the number of bits in the first data and matching the statistical result obtained with the check data in the first data, it can be determined whether the bits in the first data have been changed.
  • the main bus may be affected by occasional reasons and become unstable, thereby causing the first data transmitted on the main bus to be changed. In this case, it can still be considered to be able to communicate normally. Therefore, when it is determined that the statistical result does not match the verification data, the verification abnormal variable may be updated. If the check abnormal variable is greater than the first threshold, it means that the first data is often changed, and the main bus may be abnormal and not affected by accidental reasons. Therefore, it can be determined that the check abnormal variable is in an abnormal state.
  • the bus variable is an abnormal type variable
  • the process of judging whether the bus variable is in an abnormal state includes:
  • Step 41 Extract the type data of the frame type field in the first data.
  • Step 42 If the type data does not belong to the standard type data, then update the type exception variable.
  • Step 43 If the type abnormal variable is greater than the second threshold, determine that the type abnormal variable is in an abnormal state.
  • the frame type field refers to a field used to indicate the type of the first data, and the specific data in this field is the type data.
  • Standard type data refers to the optional legal data of the type field. If the type data belongs to the standard type data, it means that the type of the first data is legal and definite, and can be accurately identified. Similar to the situation described in the previous embodiment, when the main bus is affected by factors such as signal interference, the data transmitted on it may change, and the bits in it may change, so an even number of bits may be blocked at the same time. Changes, which may involve bits in the Frame Type field. The main bus may also be affected by accidental reasons and become unstable, thereby causing the first data transmitted on it to be changed.
  • the type abnormal variable is greater than the second threshold, it indicates that the main bus may be abnormal but not affected by occasional causes, so it can be determined that the type abnormal variable is in an abnormal state.
  • the process of judging whether the first data satisfies the bus abnormal condition may include:
  • Step 51 Extract the current bus flag data in the first data.
  • Step 52 If the current bus flag data does not match the local bus data, determine that the data bus flag exception condition is satisfied.
  • the local bus data refers to identity data used to represent each target bus considered by the controller.
  • the first data contains what the controller that generates the first data considers as the main bus. Since the controller may have sent data in the previous round, it cannot tell whether the main bus is abnormal. If the current bus flag data in the obtained first data has changed, it means that other controllers have determined that the main bus that this controller thinks is abnormal, so the detection results of other controllers can be used to determine that the main bus is abnormal. That is, in the case that the current bus flag data does not match the local bus data, it is determined that the abnormal condition of the data bus flag is met.
  • S103 Select a target candidate bus whose state is healthy as a new master bus, and update local bus data.
  • the target candidate bus whose state is healthy can be used as a new main bus, and the local bus data can be updated.
  • the target candidate bus is firstly a candidate bus, and secondly, its state must be healthy. After these two requirements are met, any candidate bus can be a target candidate bus. Therefore, it can be understood that when a new master bus is selected, the health status of each candidate bus needs to be detected.
  • This embodiment does not limit the specific method of health status monitoring. For example, while the local bus data indicates the identity of the main bus, it can also represent the health status of each target midline. By reading the local bus data, the target candidate bus can be determined.
  • this embodiment does not limit the specific content of the preset operation.
  • the target data is written to the target cache location, such as the above-mentioned "accept buffer district".
  • the target cache location may be cleared and an exception may be reported.
  • FIG. 5 is a flowchart of a specific bus exception handling method provided by the embodiment of the present application.
  • the main bus is bus1
  • the first data that is, the obtained data of the "receiving buffer", that is, the 48-bit synchronous data of other nodes
  • the "currently used bus number” value that is, the current bus flag data
  • the first data is obtained, it is possible to check whether the first data is consistent with the second data on bus2 by polling. If not, it is judged whether the variable is abnormal.
  • Type exception variable Among them, if a bad-cycle is detected, that is, a data frame without a stop bit is received, then bus1 is monitored to determine whether it is all zeros or all ones, and if so, it is determined that the condition is met. Alternatively, it is judged whether the cumulative number of CRC errors (that is, the abnormal check variable) is greater than 3 times, and if so, it is determined that the condition is met.
  • each controller has a main bus and several candidate buses, and at least two buses are used for data communication.
  • the target data obtained on the main bus is called the first data
  • the first data can be used to judge whether the main bus is abnormal, that is, whether the first data meets the bus abnormal condition, and then judge whether there is communication during the control period.
  • the abnormal condition of the bus may be an abnormal condition of a data bus flag, that is, detect whether the bus flag in the first data is abnormal; or it may be an abnormal condition of data content, that is, detect whether the data content of the first data is abnormal.
  • the first data obtained from the main bus is not the data sent by the original controller, and then there is an abnormality in the main bus, and the abnormality of the main bus will cause communication between the controllers to fail smooth.
  • the healthy target candidate bus is selected as the new main bus, complete the replacement of the main bus, and The local bus data is updated to indicate that the old main bus was abnormal.
  • a healthy (i.e. normal) target candidate bus is selected as the new main bus, so the bus in the normal state can always be used for communication, which ensures smooth communication between the various controllers and avoids the upper layer software
  • the "split brain" ensures data consistency.
  • the bus anomaly handling device provided in the embodiment of the present application is introduced below.
  • the bus anomaly handling device described below and the bus anomaly handling method described above can be referred to in correspondence.
  • FIG. 6 is a schematic structural diagram of a bus exception handling device provided in an embodiment of the present application, including:
  • the obtaining module 110 is used to obtain corresponding several target data respectively from several target buses; wherein, the target bus includes a main bus and several candidate buses, and the target data corresponding to the main bus is the first data;
  • the judging module 120 is used to judge whether the first data satisfies the bus abnormal condition;
  • the bus abnormal condition is a data bus flag abnormal condition or a data content abnormal condition;
  • the bus update module 130 is configured to select a healthy target candidate bus as a new master bus if the bus exception condition is met, and update local bus data.
  • the judging module 120 includes:
  • a consistency judging unit configured to judge whether the first data is consistent with the target second data; the target data corresponding to the candidate bus is the second data, and the target second data is at least one of the second data;
  • the variable judging unit is used for judging whether the bus variable is in an abnormal state if inconsistent;
  • the first abnormality determination unit is configured to determine that the data content abnormality condition is met if the abnormal state is in the state.
  • variable judgment unit includes:
  • the monitoring sub-unit is used to monitor the main bus for a preset duration to obtain a monitoring result; wherein, the preset duration is longer than a single frame length;
  • the first determining subunit is configured to set the link abnormal variable to an abnormal state if the monitoring result is all zeros or all ones, and determine that the link abnormal variable is in an abnormal state.
  • variable judgment unit includes:
  • a statistics subunit configured to perform statistics on the number of target bits on the first data to obtain statistical results
  • the first update subunit is used to update the verification abnormal variable if the statistical result does not match the verification data in the first data
  • the second determination subunit is configured to determine that the abnormal verification variable is in an abnormal state if the abnormal verification variable is greater than the first threshold.
  • variable judgment unit includes:
  • An extracting subunit configured to extract the type data of the frame type field in the first data
  • the second update subunit is used to update the type abnormal variable if the type data does not belong to the standard type data
  • the third determining subunit is configured to determine that the type abnormal variable is in an abnormal state if the type abnormal variable is greater than the second threshold.
  • the judging module 120 includes:
  • a flag acquisition unit configured to extract the current bus flag data in the first data
  • the second abnormality determining unit is configured to determine that the abnormal condition of the data bus flag is satisfied if the current bus flag data does not match the local bus data.
  • a write module is used to write the target data into the target cache location if the bus exception condition is not met;
  • the reporting module is configured to clear the target cache position and report the exception if there is no healthy target candidate bus after it is determined that the bus exception condition is met.
  • the electronic device provided by the embodiment of the present application is introduced below, and the electronic device described below and the bus exception handling method described above may refer to each other correspondingly.
  • the present application also provides an electronic device, including a memory and one or more processors, where computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the one or more processors, the one or more processors execute The steps of the method for handling bus exceptions in the foregoing embodiments.
  • the electronic device 100 may include a processor 101 and a memory 102 , and may further include one or more of a multimedia component 103 , an information input/information output (I/O) interface 104 and a communication component 105 .
  • a multimedia component 103 may be included in the electronic device 100 .
  • I/O information input/information output
  • the processor 101 is used to control the overall operation of the electronic device 100, so as to complete all or part of the steps in the above bus exception handling method;
  • the memory 102 is used to store various types of data to support the operation of the electronic device 100, these Data may include, for example, instructions for any application or method operating on the electronic device 100, as well as application-related data.
  • the memory 102 can be realized by any type of volatile or non-volatile storage device or their combination, such as Static Random Access Memory (Static Random Access Memory, SRAM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (Read-Only Memory, One or more of Only Memory, ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • Static Random Access Memory Static Random Access Memory
  • SRAM Static Random Access Memory
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • EPROM Erasable Programmable Read-Only Memory
  • PROM Programmable Read-Only Memory
  • Read-Only Memory One or more of Only Memory, ROM
  • magnetic memory flash memory
  • flash memory magnetic disk or optical disk.
  • Multimedia components 103 may include screen and audio components.
  • the screen can be, for example, a touch screen, and the audio component is used for outputting and/or inputting audio signals.
  • an audio component may include a microphone for receiving external audio signals.
  • the received audio signal may be further stored in the memory 102 or sent via the communication component 105 .
  • the audio component also includes at least one speaker for outputting audio signals.
  • the I/O interface 104 provides an interface between the processor 101 and other interface modules, which may be a keyboard, a mouse, buttons, and the like. These buttons can be virtual buttons or physical buttons.
  • the communication component 105 is used for wired or wireless communication between the electronic device 100 and other devices.
  • Wireless communication such as Wi-Fi, Bluetooth, near field communication (Near Field Communication, NFC for short), 2G, 3G or 4G, or one or a combination of them, so the corresponding communication component 105 may include: Wi-Fi parts, Bluetooth parts, NFC parts.
  • the electronic device 100 may be implemented by one or more Application Specific Integrated Circuit (ASIC for short), Digital Signal Processor (DSP for short), Digital Signal Processing Device (DSPD for short), Programmable Logic Device (Programmable Logic Device, PLD for short), Field Programmable Gate Array (Field Programmable Gate Array, FPGA for short), controller, microcontroller, microprocessor or other electronic components are implemented for implementing the above embodiments The given bus exception handling method.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor
  • DSPD Digital Signal Processing Device
  • PLD Programmable Logic Device
  • Field Programmable Gate Array Field Programmable Gate Array
  • FPGA Field Programmable Gate Array
  • the computer-readable storage medium provided by the embodiment of the present application is introduced below, and the computer-readable storage medium described below and the bus exception handling method described above can be referred to in correspondence.
  • the present application also provides one or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, the one or more processors execute the above-mentioned embodiments.
  • the steps of the bus exception handling method are described in detail below.
  • the computer-readable storage medium may include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc., which can store program codes. medium.
  • each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same or similar parts of each embodiment can be referred to each other.
  • the description is relatively simple, and for the related information, please refer to the description of the method part.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • registers hard disk, removable disk, CD-ROM, or any other Any other known storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请公开了一种总线异常处置方法、装置、电子设备及计算机可读存储介质,该方法包括:从若干个目标总线上分别获取对应的若干个目标数据;其中,目标总线包括一个主总线和若干个候选总线,主总线对应的目标数据为第一数据;判断第一数据是否满足总线异常条件;总线异常条件为数据总线标志异常条件或数据内容异常条件;若满足总线异常条件,则选择状态为健康的目标候选总线作为新的主总线,并更新本地总线数据。

Description

一种总线异常处置方法、装置、电子设备及可读存储介质
相关申请的交叉引用
本申请要求于2021年08月27日提交中国专利局,申请号为CN202110991790.4,申请名称为“一种总线异常处置方法、装置、电子设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及控制器技术领域,特别涉及一种总线异常处置方法、总线异常处置装置、电子设备及计算机可读存储介质。
背景技术
为了保证数据安全,提高容灾性能,当前已经形成了双控、四控架构设置方案,例如为一框双控、一框四控的设置方案。多个控制器共同处理业务,大大提升了业务性能。此外,在某一个控制器突然宕机后,可以由另外一个控制器接管业务,因此大大提高了数据安全和设备的容灾性能。其中,一框四控,即是一个机箱内有4个控制器主板。为了保证设备能够利用多个控制器向外提供服务,因此需要各个控制器之间进行主控制器选举和数据同步等步骤。当前,控制器之间的通信时常出现异常。若各个控制器之间的通信不畅,则可能会选举出多个主控制器,造成上层软件的“脑裂”,或者可能会造成数据不一致的问题。
发明内容
有鉴于此,本申请的目的在于提供一种总线异常处置方法、总线异常处置装置、电子设备及计算机可读存储介质,避免造成上层软件的“脑裂”,保证数据的一致性。
为解决上述技术问题,本申请提供了一种总线异常处置方法,包括:
从若干个目标总线上分别获取对应的若干个目标数据;其中,目标总线包括一个主总线和若干个候选总线,主总线对应的所述目标数据为第一数据;
判断第一数据是否满足总线异常条件;总线异常条件为数据总线标志异常条件或数据内容异常条件;
若满足总线异常条件,则选择状态为健康的目标候选总线作为新的主总线,并更新本地总线数据。
可选地,若总线异常条件为数据内容异常条件,判断第一数据是否满足总线异常条件,包括:
判断第一数据与目标第二数据是否一致;候选总线对应的目标数据为第二数据,目标第二数据为第二数据中的至少一个;
若不一致,则判断总线变量是否处于异常状态;
若处于异常状态,则确定满足数据内容异常条件。
可选地,总线变量为链路异常变量,判断总线变量是否处于异常状态,包括:
对主总线监听预设时长,得到监听结果;其中,预设时长大于单个帧长;
若监听结果为全零或全一,则确定链路异常变量处于异常状态。
可选地,总线变量为校验异常变量,判断总线变量是否处于异常状态,包括:
对第一数据进行目标比特位数量统计,得到统计结果;
若统计结果与第一数据内的校验数据不匹配,则更新校验异常变量;
若校验异常变量大于第一阈值,则确定校验异常变量处于异常状态。
可选地,总线变量为类型异常变量,判断总线变量是否处于异常状态,包括:
提取第一数据中帧类型字段的类型数据;
若类型数据不属于标准类型数据,则更新类型异常变量;
若类型异常变量大于第二阈值,则确定类型异常变量处于异常状态。
可选地,若总线异常条件为数据总线标志异常条件,判断第一数据是否满足总线异常条件,包括:
提取第一数据中的当前总线标志数据;
若当前总线标志数据与本地总线数据不匹配,则确定满足数据总线标志异常条件。
可选地,若不满足总线异常条件,则将目标数据写入目标缓存位置;
在确定满足总线异常条件之后,若不存在状态为健康的目标候选总线,则清空目标缓存位置并上报异常。
本申请还提供了一种总线异常处置装置,包括:
获取模块,用于从若干个目标总线上分别获取对应的若干个目标数据;其中,目标总线包括一个主总线和若干个候选总线,主总线对应的目标数据为第一数据;
判断模块,用于判断第一数据是否满足总线异常条件;总线异常条件为数据总线标志异常条件或数据内容异常条件;
总线更新模块,用于若满足总线异常条件,则选择状态为健康的目标候选总线作为新的主总线,并更新本地总线数据。
本申请还提供一种电子设备,包括存储器及一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述实施例的总线异常处置方法的步骤。
本申请还提供一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述实施例的总线异常处置方法的步骤。
本申请提供的总线异常处置方法,从若干个目标总线上分别获取对应的若干个目标数据;其中,目标总线包括一个主总线和若干个候选总线,主总线对应的目标数据为第一数据;判断第一数据是否满足总线异常条件;总线 异常条件为数据总线标志异常条件或数据内容异常条件;若满足总线异常条件,则选择状态为健康的目标候选总线作为新的主总线,并更新本地总线数据。
可见,本申请中,各个控制器之间具有一个主总线和若干个候选总线,利用这至少两条总线共同进行数据通信。该方法中,将主总线上获取到的目标数据称为第一数据,利用第一数据可以判断主总线是否出现异常,即判断第一数据是否满足总线异常条件,进而判断是否出现了控制期间通信不畅的情况。总线异常条件可以为数据总线标志异常条件,即检测第一数据中的总线标志是否异常;或者可以为数据内容异常条件,即检测第一数据的数据内容是否异常。若满足上述任一条件,及说明从主总线上获取到的第一数据并不是原控制器发送的数据,进而说明主总线出现了异常,主总线的异常会导致各个控制器之间的通信不畅。为了避免造成上层软件的“脑裂”,或者可能会造成数据不一致的问题,在检测到满足总线异常条件后,选择状态为健康的目标候选总线作为新的主总线,完成主总线的替换,并更新本地总线数据,以便表明旧的主总线发生异常。通过更换主总线的身份,选择健康的(即正常的)目标候选总线作为新的主总线,因此可以一直利用正常状态的总线进行通信,保证了各个控制器之间的通信畅通,避免造成上层软件的“脑裂”,保证数据的一致性。
此外,本申请还提供了一种总线异常处置装置、电子设备及计算机可读存储介质,同样具有上述有益效果。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请根据一个或多个实施例中提供的一种总线异常处置方法流程图;
图2为本申请根据一个或多个实施例中实施例提供的一种控制器连接结构图;
图3为本申请根据一个或多个实施例中实施例提供的一种主节点选举过程时序图;
图4为本申请根据一个或多个实施例中实施例提供的一种节点通信步骤流程图;
图5为本申请根据一个或多个实施例中实施例提供的一种具体的总线异常处置方法流程图;
图6为本申请根据一个或多个实施例中实施例提供的一种总线异常处置装置的结构示意图;
图7为本申请根据一个或多个实施例中实施例提供的一种电子设备的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参考图1,图1为本申请实施例提供的一种总线异常处置方法流程图。该方法包括:
S101:从若干个目标总线上分别获取对应的若干个目标数据。
本申请中的部分或全部步骤可以由一机框内的任意一个控制器执行。其中,目标总线包括一个主总线和若干个候选总线,主总线对应的目标数据为第一数据。目标数据的数据内容和数据类型可以有多种选择,例如可以为同步数据广播命令,其数据内容为需要被同步的数据;或者可以为请求同步命令,其数据内容为被指定的节点的唯一身份信息;或者可以为争抢主节点身份的抢主信息。
需要说明的是,目标数据由控制器生成后,发送至所有目标总线上。在目标总线均处于正常状态的情况下,其他各个控制器可以从各个目标总线上得到目标数据,并抛弃除第一数据以外的其他目标数据,根据第一数据执行对应的指令。其中,候选总线与主总线的类型可以相同或不同。
对于各个类型的目标数据的生成方式,本实施例对此不做限定。可以理解的是,控制器在运行过程中,必然存在争抢主控制器身份的过程和数据同步的过程。请参考图2,图2为本申请实施例提供的一种控制器连接结构图。4个控制器(控制器1、控制器2、控制器3和控制器4)上的CPLD(Complex Programmable Logic Device,复杂可编程逻辑器件)通过背板互联到bus1(即总线1)和bus2(即总线2)上。背板上存在开关,该开关具体可以为一个开关芯片或者MOS管(MOSFET的缩写,MOSFET为Metal-Oxide-Semiconductor Field-Effect Transistor的缩写,金属-氧化物半导体场效应晶体管,简称金氧半场效晶体管),该开关也由CPLD控制。在控制器主板插接到位之后,CPLD再打开此开关芯片(或者MOS管),起到防止热插拔控制器的电流浪涌,对正在总线上通信的其他控制器造成干扰的问题。此外,在主板上具有上拉电阻和电源,控制器板卡采用开漏的输出模式,总线1和总线2可以以IIC(Inter-Integrated Circuit,集成电路总线)为原型,每个总线的两根数据一根是时钟线,一根是数据线(即SCL/SDA、SCL’/SDA’中的一个)。
以图2所示的结构为例,下面对选举主控制器这一过程进行举例说明。请参考图3,图3为本申请实施例提供的一种主节点选举过程时序图。CPLD检测总线的时钟线或者数据线有电平的变化即认为总线处于非空闲状态。如 果非空闲状态持续超过200ms,则CPLD认为当前总线是没有“主”的,即各个控制器之间不存在主控制器。在这种情况下,各个CPLD根据自身的node编码(即节点编码)开启第一计时器(node编码*10ms)。第一计时器满后,就在各个总线上发送一个“抢主”命令。之后开启第二计时器(node编码*1ms),如果第二计时器时间内没有发现总线处于非空闲状态,则抢主成功,同时发送一个“抢主成功”命令。
需要说明的是,本实施例并不限定上述命令的具体内容,在一种实施方式中,各类命令的格式以及内容可以如下表所示:
Figure PCTCN2021127330-appb-000001
在抢主成功后,则可以进行数据同步,请参考图4,图4为本申请实施例提供的一种节点通信步骤流程图,包括如下步骤:
步骤0:主node(即控制器)发出“upload广播命令”。所有节点同时将“接受缓冲区”收集的4个node(包括自身)同步数据并更新至自身的“BMC读取缓冲区”中,保证所有节点数据一致性。同时将“接受缓冲区”清零。
步骤1:主node发出“同步数据广播命令”,广播自己的控制器主板的关键信息“abcdef”(简单举例)。同时将该数据存储到自己的“接受缓冲区”中。另外3个node监听到“同步数据广播命令”后,将主node发出的“abcdef”存储到自己的“接受缓冲区”中。
步骤2:主node发出针对node2“请求同步命令”,之后进入被动监听状态。
步骤3:node2收到“请求同步命令”后,发出“同步数据广播命令”,广播自己的控制器主板的关键信息“abcdef”(简单举例)。同时数据存储到自己的“接受缓冲区”中。另外3个node监听到“同步数据广播命令”后,将node2发出的“abcdef”存储到自己的“接受缓冲区”中。
步骤4、步骤5、步骤6、步骤7与步骤2、步骤3的动作类似,主node遍例node3和node4。根据上述过程可知,在一个读写周期中,取消了从节点(即非主控制器)对主节点(即主控制器)回复ACK(Acknowledge character, 确认字符)的过程,取消了sda(串行数据线)驱动权的切换,提高了可靠性,降低了总线挂死的概率。
需要说明的是,上述过程中,任何控制器发送的任何信息,均需要在所有的目标总线上发送。
S102:判断第一数据是否满足总线异常条件。
由于第一数据从主总线上获取,因此,通过判断第一数据是否满足总线异常条件,可以确定本控制器当前认为的主总线是否发生了异常。在确定满足总线异常条件时,可以认为主总线已经无法正确传输数据,为了避免长时间无法与其他控制器正常通信造成上层软件出现“脑裂”以及数据不同步的问题,在这种情况下,可以执行S103步骤,否则可以执行S104步骤。
在本申请中,总线异常条件为数据总线标志异常条件或数据内容异常条件。数据总线标志异常条件,是指说明第一数据中总线标志异常的条件。数据内容异常条件,是指说明第一数据的数据内容异常的条件。通过上述两种具体的异常条件,可以从第一数据的总线标志以及第一数据的数据内容两个方面对主总线是否发生异常进行说明。
具体的,在一种可行的实施方式中,若总线异常条件为数据内容异常条件,判断第一数据是否满足总线异常条件的过程可以包括:
步骤11:判断第一数据与目标第二数据是否一致。
步骤12:若不一致,则判断总线变量是否处于异常状态。
步骤13:若处于异常状态,则确定满足数据内容异常条件。
本实施例中,候选总线对应的目标数据为第二数据,目标第二数据为第二数据中的至少一个。即在得到第一数据和第二数据后,从若干个第二数据中确定出若干个目标第二数据,并将各个目标第二数据与第一数据进行比对,判断第一数据与各个目标第二数据是否均一致。由于超过两个总线同时发生异常的概率几乎为零,可以认为不可能出现第一数据与目标第二数据均为异常数据的情况。因此,若第一数据与至少一个目标第二数据不一致时,可以认为第一数据可能出现异常。在这种情况下,为了准确确定主总线是否异常,可以判断总线变量是否处于异常状态,若处于异常状态,则可以额确定满足数据内容异常条件。
总线变量,是指描述主总线工作情况的变量,其具体类型和数量不做限定,例如可以为链路异常变量、校验异常变量、类型异常变量等。可以理解的是,不同类型的总线变量可以表征主总线发生了不同类型的异常,总线变量的数量越多,即可从越多的角度对主总线进行检测。本实施例并不限定判断总线便令是否处于异常状态的具体检测方式,在一种实施方式中,总线变量为链路异常变量,判断总线变量是否处于异常状态的过程包括:
步骤21:对主总线监听预设时长,得到监听结果。
步骤22:若监听结果为全零或全一,则将链路异常变量设置为异常状态,确定链路异常变量处于异常状态。
由于物理链路发生故障时,例如图2中的开关发生故障后,链路将与3V3电源彻底接通或断开,在这种情况下,可以通过对主总线进行监听的方式判 断物理链路是否发生异常。具体的,需要对主总线监听预设时长得到监听结果,预设时长大于单个帧长,避免在监听期间总线上刚好存在全零或全一信号。若监听结果为全零或全一,则说明物理链路发生故障,在这种情况下,可以额将链路异常变量设置为异常状态,进而确定检测到链路异常变量处于异常状态。
进一步的,在另一种实施方式中,总线变量为校验异常变量,判断总线变量是否处于异常状态,包括:
步骤31:对第一数据进行目标比特位数量统计,得到统计结果。
步骤32:若统计结果与第一数据内的校验数据不匹配,则更新校验异常变量。
步骤33:若校验异常变量大于第一阈值,则确定校验异常变量处于异常状态。
目标比特位,是指预先设定的用于作为检测标准的比特位,例如为0比特位或1比特位。由于主总线收到信号干扰等因素的影响时,其上传输的数据可能发生变化,其中的比特位会发生改变,例如从0比特位改变为1比特位。因此,通过对第一数据中的比特位数量进行统计,并将统计得到的统计结果与第一数据内的校验数据进行匹配,可以确定第一数据中的比特位是否被改变。
在实际应用中,主总线可能受到偶发原因的影响而不稳定,进而导致其上传输的第一数据被改变。在这种情况下,还是可以认为其能够正常通信。因此,在确定统计结果与校验数据不匹配时,可以更新校验异常变量。校验异常变量若大于第一阈值,则说明第一数据被改变的情况较多,主总线可能出现异常而并不是收到偶发原因的影响,因此可以确定校验异常变量处于异常状态。
进一步的,在另一种实施方式中,总线变量为类型异常变量,判断总线变量是否处于异常状态的过程包括:
步骤41:提取第一数据中帧类型字段的类型数据。
步骤42:若类型数据不属于标准类型数据,则更新类型异常变量。
步骤43:若类型异常变量大于第二阈值,则确定类型异常变量处于异常状态。
帧类型字段,是指用于表示第一数据的类型的字段,该字段内的具体数据即为类型数据。标准类型数据,是指类型字段可选的合法数据。若类型数据属于标准类型数据,则说明第一数据的类型是合法且确定的,能够被准确识别。与上个实施方式所述的情况类似,由于主总线收到信号干扰等因素的影响时,其上传输的数据可能发生变化,其中的比特位会发生改变,因此可能出现偶数个比特位被同时改变的情况,其中可能涉及帧类型字段中的比特位。主总线也可能受到偶发原因的影响而不稳定,进而导致其上传输的第一数据被改变。在这种情况下,通过将类型数据与标准类型数据进行比对,并在类型数据不属于标准类型数据时更新类型异常变量。若类型异常变量大于 第二阈值,则说明主总线可能出现异常而并不是收到偶发原因的影响,因此可以确定类型异常变量处于异常状态。
在另外的实施方式中,若总线异常条件为数据总线标志异常条件,判断第一数据是否满足总线异常条件的过程可以包括:
步骤51:提取第一数据中的当前总线标志数据。
步骤52:若当前总线标志数据与本地总线数据不匹配,则确定满足数据总线标志异常条件。
其中,本地总线数据,是指用于表示本控制器认为的各个目标总线的身份数据。第一数据中具有生成第一数据的控制器认为的主总线,由于本控制器上一轮可能在发送数据,因此其无法分辨主总线是否发生异常。若获取到的第一数据中当前总线标志数据已经发生改变,则说明其他控制器已经认定本控制器认为的主总线发生异常,因此可以借鉴其它控制器的检测结果,认定主总线出现异常。即在当前总线标志数据与本地总线数据不匹配的情况下,认定满足数据总线标志异常条件。
S103:选择状态为健康的目标候选总线作为新的主总线,并更新本地总线数据。
在确定主总线出现异常后,可以利用状态为健康的目标候选总线作为新的主总线,并更新本地总线数据。其中,目标候选总线首先为候选总线,其次,其状态必须为健康,在满足这两点要求后,任一候选总线均可以为目标候选总线。因此,可以理解的是,在选择新的主总线时,需要对各个候选总线的健康状态进行检测。本实施例并不限定健康状态监测的具体方式,例如,本地总线数据在表明主总线身份的同时,还可以表征各个目标中线的健康状态,通过读取本地总线数据,即可确定目标候选总线。
S104:预设操作。
需要说明的是,本实施例并不限定预设操作的具体内容,例如在一种实施方式中,若不满足总线异常条件,则将目标数据写入目标缓存位置,例如为上述的“接受缓冲区”。此外,在确定满足总线异常条件之后,若不存在状态为健康的目标候选总线,则可以清空目标缓存位置并上报异常。
请参考图5,图5为本申请实施例提供的一种具体的总线异常处置方法流程图。继续以图2所示的结构为例,若主总线为bus1,在bus1上运行时,获取第一数据(即获取到的“接受缓冲区”的数据,也即其他node的48个bit同步数据)后,检测其中的“当前使用总线号”数值(即当前总线标志数据)是否等于bus1。若是,则确认无异常;若否,则查询bus2的健康状态,若健康,则将bus2确定为新的主总线,即切换到bus2。若不健康,则清空“接受缓冲区”和“BMC读取缓冲区”,若自身为主控制器则放弃主控制器身份,同时向本地BMC(Baseboard Management Controller,基板管理控制器)上报“双总线错误”的错误,进入idle(空闲)状态。
此外,在获取到第一数据后,可以轮询检测第一数据与bus2上的第二数据是否一致,若不一致,则判断变量是否异常,该变量即为链路异常变量、校验异常变量或类型异常变量。其中,若检测到bad-cycle,即接收到没有终 止位的数据帧,则对bus1进行监听,判断是否为全零或全一,若是,则确定满足条件。或者,判断CRC错误累计数(即校验异常变量)是否大于3次,若是,则确定满足条件。或者,判断接收到未定义命令码的帧累计数(即类型异常变量)是否大于3次,若是,则确定满足条件。若没有任何变量满足条件,则确定bus1没有异常,将其对应的本地的“bus1健康状态值”设置为1,即健康的状态。
应用本申请实施例提供的总线异常处置方法,各个控制器之间具有一个主总线和若干个候选总线,利用这至少两条总线共同进行数据通信。该方法中,将主总线上获取到的目标数据称为第一数据,利用第一数据可以判断主总线是否出现异常,即判断第一数据是否满足总线异常条件,进而判断是否出现了控制期间通信不畅的情况。总线异常条件可以为数据总线标志异常条件,即检测第一数据中的总线标志是否异常;或者可以为数据内容异常条件,即检测第一数据的数据内容是否异常。若满足上述任一条件,及说明从主总线上获取到的第一数据并不是原控制器发送的数据,进而说明主总线出现了异常,主总线的异常会导致各个控制器之间的通信不畅。为了避免造成上层软件的“脑裂”,或者可能会造成数据不一致的问题,在检测到满足总线异常条件后,选择状态为健康的目标候选总线作为新的主总线,完成主总线的替换,并更新本地总线数据,以便表明旧的主总线发生异常。通过更换主总线的身份,选择健康的(即正常的)目标候选总线作为新的主总线,因此可以一直利用正常状态的总线进行通信,保证了各个控制器之间的通信畅通,避免造成上层软件的“脑裂”,保证数据的一致性。
应该理解的是,虽然流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
下面对本申请实施例提供的总线异常处置装置进行介绍,下文描述的总线异常处置装置与上文描述的总线异常处置方法可相互对应参照。
请参考图6,图6为本申请实施例提供的一种总线异常处置装置的结构示意图,包括:
获取模块110,用于从若干个目标总线上分别获取对应的若干个目标数据;其中,目标总线包括一个主总线和若干个候选总线,主总线对应的目标数据为第一数据;
判断模块120,用于判断第一数据是否满足总线异常条件;总线异常条件为数据总线标志异常条件或数据内容异常条件;
总线更新模块130,用于若满足总线异常条件,则选择状态为健康的目标候选总线作为新的主总线,并更新本地总线数据。
可选地,判断模块120,包括:
一致判断单元,用于判断第一数据与目标第二数据是否一致;候选总线对应的目标数据为第二数据,目标第二数据为第二数据中的至少一个;
变量判断单元,用于若不一致,则判断总线变量是否处于异常状态;
第一异常确定单元,用于若处于异常状态,则确定满足数据内容异常条件。
可选地,变量判断单元,包括:
监听子单元,用于对主总线监听预设时长,得到监听结果;其中,预设时长大于单个帧长;
第一确定子单元,用于若监听结果为全零或全一,则将链路异常变量设置为异常状态,确定链路异常变量处于异常状态。
可选地,变量判断单元,包括:
统计子单元,用于对第一数据进行目标比特位数量统计,得到统计结果;
第一更新子单元,用于若统计结果与第一数据内的校验数据不匹配,则更新校验异常变量;
第二确定子单元,用于若校验异常变量大于第一阈值,则确定校验异常变量处于异常状态。
可选地,变量判断单元,包括:
提取子单元,用于提取第一数据中帧类型字段的类型数据;
第二更新子单元,用于若类型数据不属于标准类型数据,则更新类型异常变量;
第三确定子单元,用于若类型异常变量大于第二阈值,则确定类型异常变量处于异常状态。
可选地,判断模块120,包括:
标志获取单元,用于提取第一数据中的当前总线标志数据;
第二异常确定单元,用于若当前总线标志数据与本地总线数据不匹配,则确定满足数据总线标志异常条件。
可选地,还包括:
写入模块,用于若不满足总线异常条件,则将目标数据写入目标缓存位置;
上报模块,用于在确定满足总线异常条件之后,若不存在状态为健康的目标候选总线,则清空目标缓存位置并上报异常。
下面对本申请实施例提供的电子设备进行介绍,下文描述的电子设备与上文描述的总线异常处置方法可相互对应参照。
本申请还提供一种电子设备,包括存储器及一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述实施例的总线异常处置方法的步骤。
请参考图7,图7为本申请实施例提供的一种电子设备的结构示意图。其中电子设备100可以包括处理器101和存储器102,还可以进一步包括多媒体组件103、信息输入/信息输出(I/O)接口104以及通信组件105中的一种或多种。
其中,处理器101用于控制电子设备100的整体操作,以完成上述的总线异常处置方法中的全部或部分步骤;存储器102用于存储各种类型的数据以支持在电子设备100的操作,这些数据例如可以包括用于在该电子设备100上操作的任何应用程序或方法的指令,以及应用程序相关的数据。该存储器102可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,例如静态随机存取存储器(Static Random Access Memory,SRAM)、电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、可擦除可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM)、可编程只读存储器(Programmable Read-Only Memory,PROM)、只读存储器(Read-Only Memory,ROM)、磁存储器、快闪存储器、磁盘或光盘中的一种或多种。
多媒体组件103可以包括屏幕和音频组件。其中屏幕例如可以是触摸屏,音频组件用于输出和/或输入音频信号。例如,音频组件可以包括一个麦克风,麦克风用于接收外部音频信号。所接收的音频信号可以被进一步存储在存储器102或通过通信组件105发送。音频组件还包括至少一个扬声器,用于输出音频信号。I/O接口104为处理器101和其他接口模块之间提供接口,上述其他接口模块可以是键盘,鼠标,按钮等。这些按钮可以是虚拟按钮或者实体按钮。通信组件105用于电子设备100与其他设备之间进行有线或无线通信。无线通信,例如Wi-Fi,蓝牙,近场通信(Near Field Communication,简称NFC),2G、3G或4G,或它们中的一种或几种的组合,因此相应的该通信组件105可以包括:Wi-Fi部件,蓝牙部件,NFC部件。
电子设备100可以被一个或多个应用专用集成电路(Application Specific Integrated Circuit,简称ASIC)、数字信号处理器(Digital Signal Processor,简称DSP)、数字信号处理设备(Digital Signal Processing Device,简称DSPD)、可编程逻辑器件(Programmable Logic Device,简称PLD)、现场可编程门阵列(Field Programmable Gate Array,简称FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述实施例给出的总线异常处置方法。
下面对本申请实施例提供的计算机可读存储介质进行介绍,下文描述的计算机可读存储介质与上文描述的总线异常处置方法可相互对应参照。
本申请还提供一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述实施例的总线异常处置方法的步骤。
该计算机可读存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
本领域技术人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件的方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应该认为超出本申请的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系属于仅仅用来将一个实体或者操作与另一个实体或者操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语包括、包含或者其他任何变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。
本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (9)

  1. 一种总线异常处置方法,其特征在于,包括:
    从若干个目标总线上分别获取对应的若干个目标数据;其中,所述目标总线包括一个主总线和若干个候选总线,所述主总线对应的所述目标数据为第一数据;
    判断所述第一数据是否满足总线异常条件;所述总线异常条件为数据总线标志异常条件或数据内容异常条件;
    若满足所述总线异常条件,则选择状态为健康的目标候选总线作为新的主总线,并更新本地总线数据;及
    若所述总线异常条件为数据总线标志异常条件,所述判断所述第一数据是否满足总线异常条件,包括:
    提取所述第一数据中的当前总线标志数据;
    若所述当前总线标志数据与所述本地总线数据不匹配,则确定满足所述数据总线标志异常条件。
  2. 根据权利要求1所述的总线异常处置方法,其特征在于,所述若所述总线异常条件为数据内容异常条件,所述判断所述第一数据是否满足总线异常条件,包括:
    判断所述第一数据与目标第二数据是否一致;所述候选总线对应的所述目标数据为第二数据,目标第二数据为所述第二数据中的至少一个;
    若不一致,则判断总线变量是否处于异常状态;及
    若处于所述异常状态,则确定满足所述数据内容异常条件。
  3. 根据权利要求2所述的总线异常处置方法,其特征在于,所述总线变量为链路异常变量,所述判断总线变量是否处于异常状态,包括:
    对所述主总线监听预设时长,得到监听结果;其中,所述预设时长大于单个帧长;及
    若所述监听结果为全零或全一,则确定所述链路异常变量处于所述异常状态。
  4. 根据权利要求2所述的总线异常处置方法,其特征在于,所述总线变量为校验异常变量,所述判断总线变量是否处于异常状态,包括:
    对所述第一数据进行目标比特位数量统计,得到统计结果;
    若所述统计结果与所述第一数据内的校验数据不匹配,则更新校验异常变量;及
    若所述校验异常变量大于第一阈值,则确定所述校验异常变量处于所述异常状态。
  5. 根据权利要求2所述的总线异常处置方法,其特征在于,所述总线变量为类型异常变量,所述判断总线变量是否处于异常状态,包括:
    提取所述第一数据中帧类型字段的类型数据;
    若所述类型数据不属于标准类型数据,则更新类型异常变量;及
    若所述类型异常变量大于第二阈值,则确定所述类型异常变量处于所述异常状态。
  6. 根据权利要求1所述的总线异常处置方法,其特征在于,所述方法还包括:
    若不满足所述总线异常条件,则将所述目标数据写入目标缓存位置;
    在确定满足所述总线异常条件之后,若不存在状态为健康的所述目标候选总线,则清空所述目标缓存位置并上报异常。
  7. 一种总线异常处置装置,其特征在于,包括:
    获取模块,用于从若干个目标总线上分别获取对应的若干个目标数据;其中,所述目标总线包括一个主总线和若干个候选总线,所述主总线对应的所述目标数据为第一数据;
    判断模块,用于判断所述第一数据是否满足总线异常条件;所述总线异常条件为数据总线标志异常条件或数据内容异常条件;及
    总线更新模块,用于若满足所述总线异常条件,则选择状态为健康的目标候选总线作为新的主总线,并更新本地总线数据;
    其中,判断模块,包括:
    标志获取单元,用于提取第一数据中的当前总线标志数据;
    第二异常确定单元,用于若当前总线标志数据与本地总线数据不匹配,则确定满足数据总线标志异常条件。
  8. 一种电子设备,其特征在于,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1-6任意一项所述的方法的步骤。
  9. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1-6任意一项所述的方法的步骤。
PCT/CN2021/127330 2021-08-27 2021-10-29 一种总线异常处置方法、装置、电子设备及可读存储介质 WO2023024248A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/271,658 US11995014B2 (en) 2021-08-27 2021-10-29 Bus exception handling method and apparatus, electronic device and readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110991790.4 2021-08-27
CN202110991790.4A CN113434354B (zh) 2021-08-27 2021-08-27 一种总线异常处置方法、装置、电子设备及可读存储介质

Publications (1)

Publication Number Publication Date
WO2023024248A1 true WO2023024248A1 (zh) 2023-03-02

Family

ID=77798155

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/127330 WO2023024248A1 (zh) 2021-08-27 2021-10-29 一种总线异常处置方法、装置、电子设备及可读存储介质

Country Status (3)

Country Link
US (1) US11995014B2 (zh)
CN (1) CN113434354B (zh)
WO (1) WO2023024248A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434354B (zh) 2021-08-27 2021-12-03 苏州浪潮智能科技有限公司 一种总线异常处置方法、装置、电子设备及可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1335563A (zh) * 2000-07-27 2002-02-13 三星电子株式会社 总线系统和其数据传输方法
CN102521061A (zh) * 2011-11-23 2012-06-27 深圳市宇泰科技有限公司 总线故障智能切断的方法、装置及系统
CN202737901U (zh) * 2012-06-15 2013-02-13 北京石竹科技股份有限公司 一种1553b总线自动切换的系统
US20130155794A1 (en) * 2011-12-20 2013-06-20 Industrial Technology Research Institute Repairable multi-layer memory chip stack and method thereof
CN106027351A (zh) * 2016-07-07 2016-10-12 北京华电天仁电力控制技术有限公司 一种嵌入式Web服务器现场总线故障诊断通信模块
CN113434354A (zh) * 2021-08-27 2021-09-24 苏州浪潮智能科技有限公司 一种总线异常处置方法、装置、电子设备及可读存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5872936A (en) * 1995-05-08 1999-02-16 Apple Computer, Inc. Apparatus for and method of arbitrating bus conflicts
JP2001062125A (ja) * 1999-08-24 2001-03-13 Amenitekku:Kk パチンコホール監視システム
KR101022472B1 (ko) * 2004-01-17 2011-03-16 삼성전자주식회사 효율적으로 버스를 사용하는 방법
JP4631569B2 (ja) * 2005-07-12 2011-02-16 パナソニック株式会社 通信システム、並びにこれに用いられるマスター装置及びスレーブ装置、通信方法
JP2007122410A (ja) * 2005-10-28 2007-05-17 Nec Electronics Corp バス調停回路及びバス調停方法
JP2010140440A (ja) * 2008-12-15 2010-06-24 Toshiba Corp バス調停装置
DE112015003669B4 (de) * 2014-08-08 2022-04-28 Gentherm Gmbh Bussystem und Verfahren zu dessen Steuerung

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1335563A (zh) * 2000-07-27 2002-02-13 三星电子株式会社 总线系统和其数据传输方法
CN102521061A (zh) * 2011-11-23 2012-06-27 深圳市宇泰科技有限公司 总线故障智能切断的方法、装置及系统
US20130155794A1 (en) * 2011-12-20 2013-06-20 Industrial Technology Research Institute Repairable multi-layer memory chip stack and method thereof
CN202737901U (zh) * 2012-06-15 2013-02-13 北京石竹科技股份有限公司 一种1553b总线自动切换的系统
CN106027351A (zh) * 2016-07-07 2016-10-12 北京华电天仁电力控制技术有限公司 一种嵌入式Web服务器现场总线故障诊断通信模块
CN113434354A (zh) * 2021-08-27 2021-09-24 苏州浪潮智能科技有限公司 一种总线异常处置方法、装置、电子设备及可读存储介质

Also Published As

Publication number Publication date
CN113434354A (zh) 2021-09-24
US20240037049A1 (en) 2024-02-01
US11995014B2 (en) 2024-05-28
CN113434354B (zh) 2021-12-03

Similar Documents

Publication Publication Date Title
EP2696534B1 (en) Method and device for monitoring quick path interconnect link
KR101606289B1 (ko) 프로그래머블 컨트롤러
US20170192768A1 (en) Updating system of firmware of complex programmable logic device and updating method thereof
TWI759719B (zh) 快閃記憶體控制器及用於快閃記憶體控制器的方法
US20170286097A1 (en) Method to prevent operating system digital product key activation failures
US10275330B2 (en) Computer readable non-transitory recording medium storing pseudo failure generation program, generation method, and generation apparatus
US10496128B2 (en) Method for obtaining timestamp and computer device using the same
WO2021056393A1 (zh) 一种测试方法、电子设备和计算机可读存储介质
US20140143597A1 (en) Computer system and operating method thereof
JP2016085728A (ja) デバイス故障後のコンソールメッセージ回収方法およびシステム
WO2023024248A1 (zh) 一种总线异常处置方法、装置、电子设备及可读存储介质
CN115934389A (zh) 用于错误报告和处理的系统和方法
CN114003445A (zh) Bmc的i2c监控功能测试方法、系统、终端及存储介质
US20080288828A1 (en) structures for interrupt management in a processing environment
CN104239174A (zh) Bmc远程调试系统及方法
TWI777628B (zh) 電腦系統及其專用崩潰轉存硬體裝置與記錄錯誤資料之方法
US20230281150A1 (en) I2c deadlock and recovery method and apparatus
WO2024124862A1 (zh) 基于服务器的内存处理方法和装置、处理器及电子设备
CN109885420B (zh) 一种PCIe链路故障的分析方法、BMC及存储介质
CN108984377B (zh) 一种统计bios登录日志的方法、系统及介质
JP4299634B2 (ja) 情報処理装置及び情報処理装置の時計異常検出プログラム
CN115098342A (zh) 系统日志收集方法、系统、终端及存储介质
CN110781042B (zh) 一种基于bmc检测ubm背板的方法、设备及介质
JP6217086B2 (ja) 情報処理装置、エラー検出機能診断方法およびコンピュータプログラム
CN112306348A (zh) 识别触控操作的方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21954768

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18271658

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21954768

Country of ref document: EP

Kind code of ref document: A1