CN115459204A - Device, method, equipment and medium for processing faults of parallel HSC (HSC) chips - Google Patents

Device, method, equipment and medium for processing faults of parallel HSC (HSC) chips Download PDF

Info

Publication number
CN115459204A
CN115459204A CN202211404113.9A CN202211404113A CN115459204A CN 115459204 A CN115459204 A CN 115459204A CN 202211404113 A CN202211404113 A CN 202211404113A CN 115459204 A CN115459204 A CN 115459204A
Authority
CN
China
Prior art keywords
resistor
hsc
switch tube
chip
fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211404113.9A
Other languages
Chinese (zh)
Other versions
CN115459204B (en
Inventor
罗嗣恒
孔财
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202211404113.9A priority Critical patent/CN115459204B/en
Publication of CN115459204A publication Critical patent/CN115459204A/en
Application granted granted Critical
Publication of CN115459204B publication Critical patent/CN115459204B/en
Priority to PCT/CN2023/100842 priority patent/WO2024098750A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02HEMERGENCY PROTECTIVE CIRCUIT ARRANGEMENTS
    • H02H3/00Emergency protective circuit arrangements for automatic disconnection directly responsive to an undesired change from normal electric working condition with or without subsequent reconnection ; integrated protection
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/28Testing of electronic circuits, e.g. by signal tracer
    • G01R31/2851Testing of integrated circuits [IC]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02HEMERGENCY PROTECTIVE CIRCUIT ARRANGEMENTS
    • H02H1/00Details of emergency protective circuit arrangements
    • H02H1/0007Details of emergency protective circuit arrangements concerning the detecting means
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02HEMERGENCY PROTECTIVE CIRCUIT ARRANGEMENTS
    • H02H3/00Emergency protective circuit arrangements for automatic disconnection directly responsive to an undesired change from normal electric working condition with or without subsequent reconnection ; integrated protection
    • H02H3/02Details
    • H02H3/04Details with warning or supervision in addition to disconnection, e.g. for indicating that protective apparatus has functioned
    • H02H3/042Details with warning or supervision in addition to disconnection, e.g. for indicating that protective apparatus has functioned combined with means for locating the fault
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02HEMERGENCY PROTECTIVE CIRCUIT ARRANGEMENTS
    • H02H3/00Emergency protective circuit arrangements for automatic disconnection directly responsive to an undesired change from normal electric working condition with or without subsequent reconnection ; integrated protection
    • H02H3/08Emergency protective circuit arrangements for automatic disconnection directly responsive to an undesired change from normal electric working condition with or without subsequent reconnection ; integrated protection responsive to excess current
    • H02H3/087Emergency protective circuit arrangements for automatic disconnection directly responsive to an undesired change from normal electric working condition with or without subsequent reconnection ; integrated protection responsive to excess current for dc applications
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02HEMERGENCY PROTECTIVE CIRCUIT ARRANGEMENTS
    • H02H3/00Emergency protective circuit arrangements for automatic disconnection directly responsive to an undesired change from normal electric working condition with or without subsequent reconnection ; integrated protection
    • H02H3/08Emergency protective circuit arrangements for automatic disconnection directly responsive to an undesired change from normal electric working condition with or without subsequent reconnection ; integrated protection responsive to excess current
    • H02H3/10Emergency protective circuit arrangements for automatic disconnection directly responsive to an undesired change from normal electric working condition with or without subsequent reconnection ; integrated protection responsive to excess current additionally responsive to some other abnormal electrical conditions
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02HEMERGENCY PROTECTIVE CIRCUIT ARRANGEMENTS
    • H02H3/00Emergency protective circuit arrangements for automatic disconnection directly responsive to an undesired change from normal electric working condition with or without subsequent reconnection ; integrated protection
    • H02H3/20Emergency protective circuit arrangements for automatic disconnection directly responsive to an undesired change from normal electric working condition with or without subsequent reconnection ; integrated protection responsive to excess voltage
    • H02H3/202Emergency protective circuit arrangements for automatic disconnection directly responsive to an undesired change from normal electric working condition with or without subsequent reconnection ; integrated protection responsive to excess voltage for dc systems
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02HEMERGENCY PROTECTIVE CIRCUIT ARRANGEMENTS
    • H02H5/00Emergency protective circuit arrangements for automatic disconnection directly responsive to an undesired change from normal non-electric working conditions with or without subsequent reconnection
    • H02H5/04Emergency protective circuit arrangements for automatic disconnection directly responsive to an undesired change from normal non-electric working conditions with or without subsequent reconnection responsive to abnormal temperature

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Emergency Protection Circuit Devices (AREA)
  • Power Sources (AREA)

Abstract

The invention relates to the field of circuit design, in particular to a device, a method, equipment and a medium for processing faults of parallel HSC chips. The device comprises: multiple HSC chips connected in parallel, wherein each HSC chip is provided with a plurality of preset fault state signals; the switching circuit is corresponding to each HSC chip and is used for converting a plurality of preset fault state signals into fault alarm signals and outputting the fault alarm signals; and the controller analyzes the fault alarm signal output by each conversion circuit to determine the HSC chip with the fault and the fault type of the HSC chip. The scheme of the invention realizes automatic fault location and fault diagnosis, avoids manpower and material resources input brought by field analysis of research personnel, and improves the fault location efficiency.

Description

Parallel HSC chip fault processing device, method, equipment and medium
Technical Field
The invention relates to the field of circuit design, in particular to a device, a method, equipment and a medium for processing faults of parallel HSC chips.
Background
With the continuous rise of cloud computing technology in recent years, the internet traffic is increasing. Higher requirements are put forward on the data processing capacity and the storage capacity of the computer room server. As a unit-cabinet system in a traditional computer room, the data processing capacity of server computing nodes deployed inside a cabinet is required to be stronger and higher, and the deployment density is required to be higher and higher. With the increase of internet user service, the network data throughput is also larger and larger, and the deployment density of a data center machine room to a server is higher and higher. In order to improve the deployment density and enhance the maintainability, the servers are generally deployed in the cabinet in the form of a single node, and the single node server is directly plugged on a POWER bus bar (bus bar) to get electricity through a POWER CLIP (POWER CLIP) connector.
Servers are also increasingly burdened as basic data processing units in data centers. Especially, the work load current of the CPU chip in the server is larger and larger, and the work current of a single CPU is as high as 100 to 200A. Therefore, the current at the power supply input end of a single server node is higher and higher, and the current requirement of the corresponding hot plug line at the input end is higher and higher. Therefore, a plurality of Hot Swap Controller (HSC) chips are often connected in parallel in the Hot Swap unit to meet the requirement of high-current application. In practical application, the parallel connection of multiple HSCs has the following risks: firstly, when one or more HSCs have one-way triggering Over-Current Protection (OCP), short-circuit Protection (SCP), over-Temperature Protection (OTP), and Over-Voltage Protection (OVP), the whole hot-plug line will be powered off, and the fault location cannot be determined quickly and accurately. Secondly, when a plurality of HSCs are used in parallel, the current flowing through each HSC chip is different due to the difference between the HSC chips and the LAYOUT, and when the difference becomes large, the problem of HSC non-uniform current is caused. Under the long-time high-load working condition, breakdown failure easily occurs, and the risk of power failure or board burning caused by short circuit of the HSC chip to the ground is caused. The normal operation of customer service is influenced, and meanwhile, hidden danger is brought to the fire safety of a data center machine room due to board burning. Meanwhile, once the problem of burning the plate occurs, research and development engineers often invest resources to carry out positioning analysis on the problem on site, the problem site is difficult to reproduce due to plate burning, and the positioning burning plate root also has great difficulty.
Fig. 1 is a schematic diagram of a traditional multi-path HSC parallel application scene, when a plurality of HSC chips are connected in parallel, an input terminal Vin is gathered in a POWER CLIP connector to get POWER from a BUSBAR, and an output terminal Vout is gathered together to supply POWER to a load. Meanwhile, a Baseboard Management Controller (BMC) on the server node motherboard samples the output voltage of the HSC to detect the abnormal state of the output voltage. However, this approach has the following drawbacks: on one hand, when one or more HSCs have one-way triggering of OCP, SCP, OTP and OVP, fault location is difficult. On the other hand, because the difference between the HSC chip individuals and the line layout results in the failure of automatic protection when the HSC is not uniform, the safety is poor and improvement is urgently needed.
Disclosure of Invention
In view of the above, there is a need to provide a device, a method, an apparatus and a medium for processing failures of parallel HSC chips.
According to a first aspect of the present invention, there is provided a method for processing faults of parallel HSC chips, the method comprising:
multiple HSC chips connected in parallel, wherein each HSC chip is provided with a plurality of preset fault state signals;
the switching circuit is corresponding to each HSC chip and is used for converting a plurality of preset fault state signals into fault alarm signals and outputting the fault alarm signals;
and the controller analyzes the fault alarm signal output by each conversion circuit to determine the HSC chip with the fault and the fault type of the HSC chip.
In some embodiments, the conversion circuit comprises: the circuit comprises a first switching tube, a second switching tube, a third switching tube, a fourth switching tube, a first resistor, a second resistor, a third resistor, a fourth resistor and a pull-up resistor, wherein the first resistor, the second resistor, the third resistor and the fourth resistor are different in resistance;
the first switch tube is connected with the first resistor in series to form a first branch circuit, the second switch tube is connected with the second resistor in series to form a second branch circuit, the third switch tube is connected with the third resistor in series to form a third branch circuit, the fourth switch tube is connected with the fourth resistor in series to form a fourth branch circuit, one end of the pull-up resistor is connected with a power supply, the first branch circuit, the second branch circuit, the third branch circuit and the fourth branch circuit are sequentially connected between the other end of the pull-up resistor and the ground in parallel, the first switch tube, the second switch tube, the third switch tube and the fourth switch tube are all grounded, and the fault warning signal is a voltage value at the other end of the pull-up resistor;
the plurality of preset fault state signals comprise an overcurrent protection signal, an overtemperature protection signal, a short-circuit protection signal and an overvoltage protection signal, the overcurrent protection signal is used for driving the first switch tube to be switched on or switched off, the overtemperature protection signal is used for driving the second switch tube to be switched on or switched off, the short-circuit protection signal is used for driving the third switch tube to be switched on or switched off, the overvoltage protection signal is used for driving the fourth switch tube to be switched on or switched off, and at most one of the first switch tube, the second switch tube, the third switch tube and the fourth switch tube is switched on at the same moment.
In some embodiments, the conversion circuit comprises: the resistor comprises a first switching tube, a second switching tube, a third switching tube, a fourth switching tube, a first resistor, a second resistor, a third resistor, a fourth resistor, a fifth resistor, a sixth resistor, a seventh resistor and a pull-up resistor, wherein the first resistor, the second resistor, the third resistor, the fourth resistor, the fifth resistor, the sixth resistor, the seventh resistor and the pull-up resistor are all the same in resistance;
the first switch tube is connected with the first resistor in series to form a first branch circuit, the second switch tube is connected with the second resistor in series to form a second branch circuit, the third switch tube is connected with the third resistor in series to form a third branch circuit, the fourth switch tube is connected with the fourth resistor in series to form a fourth branch circuit, one end of the pull-up resistor is connected with a power supply, the first branch circuit, the second branch circuit, the third branch circuit and the fourth branch circuit are sequentially connected between the other end of the pull-up resistor and the ground in parallel, the first switch tube, the second switch tube, the third switch tube and the fourth switch tube are all grounded, the fifth resistor is connected between the first resistor and the second resistor in series, the sixth resistor is connected between the second resistor and the third resistor in series, the seventh resistor is connected between the third resistor and the fourth resistor in series, and the fault alarm signal is a voltage value at the other end of the pull-up resistor;
the preset fault state signals comprise an overcurrent protection signal, an overtemperature protection signal, a short-circuit protection signal and an overvoltage protection signal, the overcurrent protection signal is used for driving the first switch tube to be switched on or switched off, the overtemperature protection signal is used for driving the second switch tube to be switched on or switched off, the short-circuit protection signal is used for driving the third switch tube to be switched on or switched off, the overvoltage protection signal is used for driving the fourth switch tube to be switched on or switched off, and at most one of the first switch tube, the second switch tube, the third switch tube and the fourth switch tube is switched on at the same moment.
In some embodiments, the controller is further configured to:
responding to the fault alarm signal output by a certain conversion circuit being equal to one half of the power supply voltage, confirming that the HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip overcurrent protection;
responding to the fault alarm signal output by a certain conversion circuit being equal to two thirds of power supply voltage, confirming that the HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip over-temperature protection;
responding to the fault alarm signal output by a certain conversion circuit being equal to three-fourths of power supply voltage, confirming that an HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip short-circuit protection;
responding to the fault alarm signal output by a certain conversion circuit to be equal to four fifths of power supply voltage, and confirming that the HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip overvoltage protection;
and responding to the fault alarm signal output by a certain conversion circuit being equal to zero, and determining that the HSC chip corresponding to the certain conversion circuit is not in fault.
In some embodiments, each HSC chip further has a current detection signal, and the controller is further configured to collect the current detection signal of each HSC chip, process the collected current detection signal, and generate a current sharing alarm signal and a current sharing control signal according to a preset determination rule;
the device further comprises an AND gate, wherein a first input end of the AND gate is connected with a starting signal, a second input end of the AND gate is connected with the current sharing control signal, and an output end of the AND gate is connected to an enabling signal pin of each HSC chip.
In some embodiments, the controller is further configured to:
calculating the average value of all the collected current detection signals;
responding to the current detection signal of any HSC chip being less than 0.6 times of the average value or more than 1.6 of the average value, outputting a current-sharing alarm signal needing to be alarmed and outputting a current-sharing control signal for turning off the HSC chip;
responding to the current detection signals of all HSC chips, if the current detection signals are more than or equal to 0.6 time of the mean value and less than or equal to 1.6 times of the mean value, and the current detection signal of any HSC chip is more than or equal to 0.6 time of the mean value and less than 0.9 time of the mean value, outputting a current-sharing alarm signal needing to be alarmed and outputting a current-sharing control signal which does not turn off the HSC chips;
responding to the current detection signals of all HSC chips, wherein the current detection signals are more than or equal to 0.6 time of the mean value and less than or equal to 1.6 times of the mean value, and the current detection signal of any HSC chip is more than 1.1 times of the mean value and less than or equal to 1.6 times of the mean value, outputting a current-sharing alarm signal needing to be alarmed and outputting a current-sharing control signal which does not turn off the HSC chip;
and responding to the current detection signals of all HSC chips which are more than or equal to 0.9 time of the mean value and less than or equal to 1.1 time of the mean value, outputting a current-sharing alarm signal which does not need to be alarmed, and outputting a current-sharing control signal which does not turn off the HSC chips.
In some embodiments, the controller is a baseboard management controller.
According to a second aspect of the present invention, there is provided a method for processing faults of parallel HSC chips, the method comprising:
multiple HSC chips connected in parallel, wherein each HSC chip is provided with a plurality of preset fault state signals;
converting a plurality of preset fault state signals into fault alarm signals by using a conversion circuit corresponding to each HSC chip and outputting the fault alarm signals;
the controller analyzes the fault alarm signal output by each conversion circuit to determine the HSC chip with fault and the fault type of the HSC chip.
In some embodiments, the conversion circuit comprises: the circuit comprises a first switch tube, a second switch tube, a third switch tube, a fourth switch tube, a first resistor, a second resistor, a third resistor, a fourth resistor and a pull-up resistor, wherein the first resistor, the second resistor, the third resistor and the fourth resistor are different in resistance;
the first switch tube is connected with the first resistor in series to form a first branch circuit, the second switch tube is connected with the second resistor in series to form a second branch circuit, the third switch tube is connected with the third resistor in series to form a third branch circuit, the fourth switch tube is connected with the fourth resistor in series to form a fourth branch circuit, one end of the pull-up resistor is connected with a power supply, the first branch circuit, the second branch circuit, the third branch circuit and the fourth branch circuit are sequentially connected between the other end of the pull-up resistor and the ground in parallel, the first switch tube, the second switch tube, the third switch tube and the fourth switch tube are all grounded, and the fault warning signal is a voltage value at the other end of the pull-up resistor;
the preset fault state signals comprise an overcurrent protection signal, an overtemperature protection signal, a short-circuit protection signal and an overvoltage protection signal, the overcurrent protection signal is used for driving the first switch tube to be switched on or switched off, the overtemperature protection signal is used for driving the second switch tube to be switched on or switched off, the short-circuit protection signal is used for driving the third switch tube to be switched on or switched off, the overvoltage protection signal is used for driving the fourth switch tube to be switched on or switched off, and at most one of the first switch tube, the second switch tube, the third switch tube and the fourth switch tube is switched on at the same time.
In some embodiments, the conversion circuit comprises: the pull-up resistor comprises a first switching tube, a second switching tube, a third switching tube, a fourth switching tube, a first resistor, a second resistor, a third resistor, a fourth resistor, a fifth resistor, a sixth resistor, a seventh resistor and a pull-up resistor, wherein the first resistor, the second resistor, the third resistor, the fourth resistor, the fifth resistor, the sixth resistor, the seventh resistor and the pull-up resistor are all the same in resistance;
the first switch tube is connected with the first resistor in series to form a first branch circuit, the second switch tube is connected with the second resistor in series to form a second branch circuit, the third switch tube is connected with the third resistor in series to form a third branch circuit, the fourth switch tube is connected with the fourth resistor in series to form a fourth branch circuit, one end of the pull-up resistor is connected with a power supply, the first branch circuit, the second branch circuit, the third branch circuit and the fourth branch circuit are sequentially connected between the other end of the pull-up resistor and the ground in parallel, the first switch tube, the second switch tube, the third switch tube and the fourth switch tube are all grounded, the fifth resistor is connected between the first resistor and the second resistor in series, the sixth resistor is connected between the second resistor and the third resistor in series, the seventh resistor is connected between the third resistor and the fourth resistor in series, and the fault alarm signal is a voltage value at the other end of the pull-up resistor;
the preset fault state signals comprise an overcurrent protection signal, an overtemperature protection signal, a short-circuit protection signal and an overvoltage protection signal, the overcurrent protection signal is used for driving the first switch tube to be switched on or switched off, the overtemperature protection signal is used for driving the second switch tube to be switched on or switched off, the short-circuit protection signal is used for driving the third switch tube to be switched on or switched off, the overvoltage protection signal is used for driving the fourth switch tube to be switched on or switched off, and at most one of the first switch tube, the second switch tube, the third switch tube and the fourth switch tube is switched on at the same moment.
In some embodiments, the analyzing, by the controller, the fault alarm signal output by each conversion circuit to determine the failed HSC chip and the type of fault of the HSC chip comprises:
responding to the fault alarm signal output by a certain conversion circuit being equal to one half of power supply voltage, confirming that an HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip overcurrent protection;
responding to the fault alarm signal output by a certain conversion circuit being equal to two thirds of power supply voltage, confirming that the HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip over-temperature protection;
responding to the fault alarm signal output by a certain conversion circuit being equal to three-fourths of power supply voltage, confirming that an HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip short-circuit protection;
responding to the fault alarm signal output by a certain conversion circuit to be equal to four fifths of power supply voltage, and confirming that the HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip overvoltage protection;
and responding to the fault alarm signal output by a certain conversion circuit being equal to zero, and determining that the HSC chip corresponding to the certain conversion circuit is not in fault.
In some embodiments, each HSC chip further has a current detection signal, the controller is further configured to collect the current detection signal of each HSC chip, process the collected current detection signal, and generate a current sharing alarm signal and a current sharing control signal according to a preset determination rule, and the method further includes;
and connecting the first input end of the AND gate with a starting signal, connecting the second input end of the AND gate with the current-sharing control signal, and connecting the output end of the AND gate to an enabling signal pin of each HSC chip.
In some embodiments, the processing the collected current detection signals and then generating the current sharing alarm signal and the current sharing control signal according to a preset determination rule includes:
calculating the average value of all the collected current detection signals;
responding to the current detection signal of any HSC chip being less than 0.6 times of the average value or more than 1.6 of the average value, outputting a current-sharing alarm signal needing to be alarmed and outputting a current-sharing control signal for turning off the HSC chip;
responding to the current detection signals of all HSC chips, if the current detection signals are more than or equal to 0.6 time of the mean value and less than or equal to 1.6 times of the mean value, and the current detection signal of any HSC chip is more than or equal to 0.6 time of the mean value and less than 0.9 time of the mean value, outputting a current-sharing alarm signal needing to be alarmed and outputting a current-sharing control signal which does not turn off the HSC chips;
responding to the current detection signals of all HSC chips, wherein the current detection signals are more than or equal to 0.6 time of the mean value and less than or equal to 1.6 times of the mean value, and the current detection signal of any HSC chip is more than 1.1 times of the mean value and less than or equal to 1.6 times of the mean value, outputting a current-sharing alarm signal needing to be alarmed and outputting a current-sharing control signal which does not turn off the HSC chip;
and responding to the current detection signals of all HSC chips which are more than or equal to 0.9 time of the mean value and less than or equal to 1.1 time of the mean value, outputting a current-sharing alarm signal which does not need to be alarmed, and outputting a current-sharing control signal which does not turn off the HSC chips.
In some embodiments, the controller is a baseboard management controller.
According to a third aspect of the present invention, there is also provided a computer apparatus comprising:
at least one processor; and
the memory stores a computer program which can run on the processor, and the processor executes the method for processing the fault of the parallel HSC chip when executing the program.
According to a fourth aspect of the present invention, there is also provided a computer-readable storage medium storing a computer program, which when executed by a processor, performs the method for processing the failure of the parallel HSC chips.
According to the parallel HSC chip fault processing device, the preset fault state signal inside the HSC chip is converted into the fault alarm signal through the conversion circuit, and the fault alarm signal is analyzed and judged through the controller to locate the specific fault position and fault reason, so that automatic fault location and fault diagnosis are realized, the manpower and material resources input brought by field analysis of research and development personnel is avoided, and the fault location efficiency is improved.
In addition, the invention also provides a method for processing faults of the parallel HSC chips, computer equipment and a computer readable storage medium, which can also realize the technical effects and are not described again.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
Fig. 1 is a schematic diagram of a conventional multi-HSC parallel application scenario;
fig. 2 is a schematic diagram of a device for handling a fault of parallel HSC chips according to an embodiment of the present invention;
fig. 3A is a schematic structural diagram of a conversion circuit according to an embodiment of the present invention;
fig. 3B is a schematic structural diagram of a conversion circuit according to another embodiment of the present invention;
fig. 4 is a schematic diagram illustrating a parallel HSC current sharing detection and determination principle according to another embodiment of the present invention;
fig. 5 is a schematic diagram illustrating a method for handling a fault of a parallel HSC chip according to an embodiment of the present invention;
fig. 6 is an internal structural view of a computer device in another embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
In an embodiment, referring to fig. 2, the present invention provides a device for handling failures of parallel HSC chips, specifically, the device includes:
multiple HSC chips connected in parallel, wherein each HSC chip is provided with a plurality of preset fault state signals;
the conversion circuit is corresponding to each HSC chip and is used for converting a plurality of preset fault state signals into fault alarm signals and outputting the fault alarm signals;
and the controller analyzes the fault alarm signal output by each conversion circuit to determine the HSC chip with the fault and the fault type of the HSC chip.
According to the parallel HSC chip fault processing device, the preset fault state signal inside the HSC chip is converted into the fault alarm signal through the conversion circuit, and the fault alarm signal is analyzed and judged through the controller to locate the specific fault position and fault reason, so that automatic fault location and fault diagnosis are realized, the manpower and material resources input brought by field analysis of research and development personnel is avoided, and the fault location efficiency is improved.
In some embodiments, referring to fig. 3A, the conversion circuit includes: the circuit comprises a first switching tube S1, a second switching tube S2, a third switching tube S3, a fourth switching tube S4, a first resistor R1, a first resistor R2, a third resistor R3, a fourth resistor R4 and a pull-up resistor, wherein the first resistor R1, the first resistor R2, the third resistor R3 and the fourth resistor R4 are different in resistance;
the first switch tube S1 is connected with the first resistor R1 in series to form a first branch, the second switch tube S2 is connected with the first resistor R2 in series to form a second branch, the third switch tube S3 is connected with the third resistor R3 in series to form a third branch, the fourth switch tube S4 is connected with the fourth resistor R4 in series to form a fourth branch, one end of the pull-up resistor is connected with the power supply, the first branch, the second branch, the third branch and the fourth branch are sequentially connected between the other end of the pull-up resistor and the ground in parallel, the first switch tube S1, the second switch tube S2, the third switch tube S3 and the fourth switch tube S4 are all grounded, and the fault alarm signal is a voltage value at the other end of the pull-up resistor;
the preset fault state signals comprise an overcurrent protection signal, an overtemperature protection signal, a short-circuit protection signal and an overvoltage protection signal, the overcurrent protection signal is used for driving the first switching tube S1 to be switched on or switched off, the overtemperature protection signal is used for driving the second switching tube S2 to be switched on or switched off, the short-circuit protection signal is used for driving the third switching tube S3 to be switched on or switched off, and the overvoltage protection signal is used for driving the fourth switching tube S4 to be switched on or switched off, wherein at most one of the first switching tube S1, the second switching tube S2, the third switching tube S3 and the fourth switching tube S4 is switched on at the same time.
In some embodiments, referring to fig. 3B, the conversion circuit includes: the circuit comprises a first switching tube S1, a second switching tube S2, a third switching tube S3, a fourth switching tube S4, a first resistor R1, a first resistor R2, a third resistor R3, a fourth resistor R4, a fifth resistor R5, a sixth resistor R6, a seventh resistor R7 and a pull-up resistor, wherein the first resistor R1, the first resistor R2, the third resistor R3, the fourth resistor R4, the fifth resistor R5, the sixth resistor R6, the seventh resistor R7 and the pull-up resistor are the same in resistance;
the first switch tube S1 is connected in series with the first resistor R1 to form a first branch, the second switch tube S2 is connected in series with the first resistor R2 to form a second branch, the third switch tube S3 is connected in series with the third resistor R3 to form a third branch, the fourth switch tube S4 is connected in series with the fourth resistor R4 to form a fourth branch, one end of the pull-up resistor is connected with the power supply, the first branch, the second branch, the third branch and the fourth branch are sequentially connected in parallel between the other end of the pull-up resistor and the ground, the first switch tube S1, the second switch tube S2, the third switch tube S3 and the fourth switch tube S4 are all grounded, the fifth resistor R5 is connected in series between the first resistor R1 and the first resistor R2, the sixth resistor R6 is connected in series between the first resistor R2 and the third resistor R3, the seventh resistor R7 is connected in series between the third resistor R3 and the fourth resistor R4, and the other end of the fault signal is the pull-up voltage value;
the preset fault state signals comprise an overcurrent protection signal, an overtemperature protection signal, a short-circuit protection signal and an overvoltage protection signal, the overcurrent protection signal is used for driving the first switch tube S1 to be switched on or switched off, the overtemperature protection signal is used for driving the second switch tube S2 to be switched on or switched off, the short-circuit protection signal is used for driving the third switch tube S3 to be switched on or switched off, and the overvoltage protection signal is used for driving the fourth switch tube S4 to be switched on or switched off, wherein at most one of the first switch tube S1, the second switch tube S2, the third switch tube S3 and the fourth switch tube S4 is switched on at the same time.
In some embodiments, the controller is further configured to:
responding to the fault alarm signal output by a certain conversion circuit being equal to one half of the power supply voltage, confirming that the HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip overcurrent protection;
responding to the fault alarm signal output by a certain conversion circuit being equal to two thirds of power supply voltage, confirming that the HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip over-temperature protection;
responding to the fault alarm signal output by a certain conversion circuit being equal to three-fourths of power supply voltage, confirming that an HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip short-circuit protection;
responding to the fault alarm signal output by a certain conversion circuit to be equal to four fifths of power supply voltage, and confirming that the HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip overvoltage protection;
and responding to the fault alarm signal output by a certain conversion circuit being equal to zero, and determining that the HSC chip corresponding to the certain conversion circuit is not in fault.
In some embodiments, referring to fig. 4, each HSC chip further has a current detection signal, and the controller is further configured to collect the current detection signal of each HSC chip, process the collected current detection signal, and generate a current sharing alarm signal and a current sharing control signal according to a preset determination rule;
the device further comprises an AND gate, wherein a first input end of the AND gate is connected with a starting signal, a second input end of the AND gate is connected with the current sharing control signal, and an output end of the AND gate is connected to an enabling signal pin of each HSC chip.
In some embodiments, the controller is further configured to:
calculating the average value of all the collected current detection signals;
responding to the current detection signal of any HSC chip being less than 0.6 times of the average value or more than 1.6 of the average value, outputting a current-sharing alarm signal needing to be alarmed and outputting a current-sharing control signal for turning off the HSC chip;
responding to the current detection signals of all HSC chips, if the current detection signals are more than or equal to 0.6 time of the mean value and less than or equal to 1.6 times of the mean value, and the current detection signal of any HSC chip is more than or equal to 0.6 time of the mean value and less than 0.9 time of the mean value, outputting a current-sharing alarm signal needing to be alarmed and outputting a current-sharing control signal which does not turn off the HSC chips;
responding to that the current detection signals of all HSC chips are more than or equal to 0.6 time of the mean value and less than or equal to 1.6 times of the mean value, and the current detection signal of any HSC chip is more than 1.1 times of the mean value and less than or equal to 1.6 times of the mean value, outputting a current-sharing alarm signal needing to be alarmed and outputting a current-sharing control signal not turning off the HSC chips;
and responding to the current detection signals of all HSC chips which are more than or equal to 0.9 time of the mean value and less than or equal to 1.1 time of the mean value, outputting a current-sharing alarm signal which does not need to be alarmed, and outputting a current-sharing control signal which does not turn off the HSC chips.
In some embodiments, the controller is a baseboard management controller.
In another embodiment, to facilitate understanding of the solution of the present invention, a data center server is taken as an example as a load, and the load uses n (n is greater than or equal to 2) HSC chips to supply power in parallel, please refer to fig. 2 to 4 again, which provide another parallel HSC chip fault handling apparatus, including: POWER CLIP connector, HSC0, HSC1, \ 8230, HSCn, conversion circuit, LOAD LOAD, BMC. The device is different from the traditional mode (the voltage signal transmitted by the HSC is monitored by the BMC alone) in that: (1) And corresponding the fault states of the OCP, SCP, OTP and OVP in the HSC with the voltage value of the alarm signal ALERT, and positioning specific fault information by judging the voltage value of the alarm signal in the BMC. And 92, introducing an IMON signal of the HSC chips into the BMC, and generating a current-sharing alarm signal and a current-sharing control signal according to a judgment rule after the IMON signal is digitally processed in the BMC. And the current-sharing control signal is used for controlling whether the HSC is closed or not, so that the problem of plate burning caused by non-uniform current can be effectively avoided. The function of the various parts of the device is explained in detail below:
HSCn has multiple input ends connected in parallel to get Vin from BUSBAR through POWER CLIP connector, and output ends connected in parallel to converge and output Vout to supply POWER for LOAD.
The current flowing through each HSC chip was in turn: i0 I1, \ 8230;, in. Each HSC chip elicited a signal: [ IMON0, IMON1, \8230 ], [ IMONn ], [ ALERT0, ALERT0, \8230;, ALERTn ] is connected to the ADC I/O port of BMC. [ IMON0, IMON1, \8230;, IMONn ] indicates the current detection signal derived from HSC0, HSC1, \8230;, HSCn chips, representing the magnitude of the current flowing through each HSC chip.
The alarm signals led out from HSC0, HSC1, \8230, HSCn chips represent the specific fault triggered by each HSC chip.
The BMC can judge and process the voltage value according to the collected Current signal and the alarm signal, gives a fault positioning result and triggers the Current sharing protection signal Current Balance.
As shown in fig. 3B, the HSC fault location differentiation and processing circuit structure provided in this embodiment takes a common 4-level fault of the HSC chip as an example (when S0, S1, S2, and S3 are low, the transfer line switch is closed): wherein, S0 represents: triggering OCP (over-current protection); s1 represents: triggering OTP (over temperature protection); s2 represents: triggering SCP (short circuit protection); s3 represents: triggering OVP (overvoltage protection); the 4 logic level signals are used to control the conversion circuit at the lower right of fig. 3B, and output the ALERT signals with different voltage values, and the conversion relationship of S0, S1, S2, and S3 is as the following table 1.
TABLE 1 Fault Warning Signal Voltage value and Fault type correspondence
Figure 749310DEST_PATH_IMAGE002
FIG. 4 is a schematic diagram of a current sharing diagnostic structure for detecting multiple HSC parallel circuits, which generates current detection signals for HSC0, HSC1, \8230;, HSCn: [ IMON0, IMON1, \8230;, IMONn ] is passed to the ADC I/O port of BMC: [ ADC0, ADC1, \ 8230; ADCn ]. After the current detection signal is digitally processed inside the BMC chip, the current average value is calculated according to the following formula:
Figure 666450DEST_PATH_IMAGE003
IMONi and
Figure 508504DEST_PATH_IMAGE004
the following criteria are shown in Table 2 below:
TABLE 2 IMONi, current sharing alarm signal, current sharing control signal corresponding relation
Figure 903713DEST_PATH_IMAGE005
The BMC can make a comparison judgment according to the IMONi sum to generate a Current Balance ALERT signal and a Current Balance SHUT control signal. The correspondence and the corresponding actions are shown in table 2 above. When the Balance SHUT control signal is low, the HSC is turned off after passing through the AND gate together with the power-on signal PSON. Thereby preventing HSC burnout due to severe uneven flow.
The following will describe in detail the specific working engineering of the device for processing the parallel HSC chip failures:
step one, according to server system configuration: the method comprises the steps of counting the total power consumption P _ T of a node mainboard by a CPU, a memory, a hard disk array and a system fan module, and determining the maximum current value I _ T of a power supply path from a BUSBAR to a HSC according to the derating standard of 80%.
And step two, selecting a POWER CLIP connector meeting the current flow size, HSC chips (such as MP5991 GLU-Z) with proper models and the number of HSC chips needing to be connected in parallel according to the maximum current value I _ T.
And step three, respectively connecting the input end and the output end of each group of HSC chips in parallel and mutually. And interconnecting the IMON pin of each group of HSC chips with the ADC I/O pin of the BMC. And interconnecting the ALERT pin of the HSC chip with the ADC I/O pin of the BMC.
And step four, importing a judgment rule of alarm signal fault information and importing a current-sharing signal alarm and control judgment rule into the BMC firmware.
And fifthly, connecting the Current Balance SHUT generated by the BMC and the node starting signal PSON with pins EN of the HSC chips through a logic AND gate chip.
The parallel HSC chip fault processing at least has the following beneficial technical effects: firstly, OCP, SCP, OTP and OVP fault states in the HSC are corresponding to the voltage value of an alarm signal ALERT, and specific fault information is positioned through judgment of the alarm signal voltage value in the BMC, so that the investment of manpower and material resources brought by field analysis of research and development personnel is avoided, and the fault positioning efficiency is improved; secondly, an IMON signal of the multi-HSC chip is introduced into the BMC, and after the IMON signal is digitally processed inside the BMC, a current sharing alarm signal and a current sharing control signal are generated according to a judgment rule. And uses the current sharing control signal to control whether the HSC is turned off. The problem of board burning caused by uneven current can be effectively avoided, and the power supply safety of the server is enhanced.
In another embodiment, referring to fig. 5, the present invention further provides a method 100 for processing failures of parallel HSC chips, the method comprising:
101, connecting multiple HSC chips in parallel, wherein each HSC chip is provided with a plurality of preset fault state signals;
102, converting a plurality of preset fault state signals into fault alarm signals by using a conversion circuit corresponding to each HSC chip and outputting the fault alarm signals;
step 103, the controller analyzes the fault alarm signal output by each switching circuit to determine the failed HSC chip and the fault type of the HSC chip.
According to the method for processing the faults of the parallel HSC chips, the preset fault state signals inside the HSC chips are converted into the fault alarm signals through the conversion circuit, and the fault alarm signals are analyzed and judged through the controller to position specific fault positions and fault reasons, so that automatic fault positioning and fault diagnosis are achieved, manpower and material resources input brought by field analysis of research and development personnel are avoided, and the fault positioning efficiency is improved.
In some embodiments, the conversion circuit comprises: the circuit comprises a first switching tube S1, a second switching tube S2, a third switching tube S3, a fourth switching tube S4, a first resistor R1, a first resistor R2, a third resistor R3, a fourth resistor R4 and a pull-up resistor, wherein the first resistor R1, the first resistor R2, the third resistor R3 and the fourth resistor R4 are different in resistance;
the first switch tube S1 is connected with the first resistor R1 in series to form a first branch, the second switch tube S2 is connected with the first resistor R2 in series to form a second branch, the third switch tube S3 is connected with the third resistor R3 in series to form a third branch, the fourth switch tube S4 is connected with the fourth resistor R4 in series to form a fourth branch, one end of the pull-up resistor is connected with a power supply, the first branch, the second branch, the third branch and the fourth branch are sequentially connected between the other end of the pull-up resistor and the ground in parallel, the first switch tube S1, the second switch tube S2, the third switch tube S3 and the fourth switch tube S4 are all grounded, and the fault warning signal is a voltage value at the other end of the pull-up resistor;
the preset fault state signals comprise an overcurrent protection signal, an overtemperature protection signal, a short-circuit protection signal and an overvoltage protection signal, the overcurrent protection signal is used for driving the first switching tube S1 to be switched on or switched off, the overtemperature protection signal is used for driving the second switching tube S2 to be switched on or switched off, the short-circuit protection signal is used for driving the third switching tube S3 to be switched on or switched off, and the overvoltage protection signal is used for driving the fourth switching tube S4 to be switched on or switched off, wherein at most one of the first switching tube S1, the second switching tube S2, the third switching tube S3 and the fourth switching tube S4 is switched on at the same time.
In some embodiments, the conversion circuit comprises: the circuit comprises a first switching tube S1, a second switching tube S2, a third switching tube S3, a fourth switching tube S4, a first resistor R1, a first resistor R2, a third resistor R3, a fourth resistor R4, a fifth resistor R5, a sixth resistor R6, a seventh resistor R7 and a pull-up resistor, wherein the first resistor R1, the first resistor R2, the third resistor R3, the fourth resistor R4, the fifth resistor R5, the sixth resistor R6, the seventh resistor R7 and the pull-up resistor are the same in resistance;
the first switch tube S1 is connected in series with the first resistor R1 to form a first branch, the second switch tube S2 is connected in series with the first resistor R2 to form a second branch, the third switch tube S3 is connected in series with the third resistor R3 to form a third branch, the fourth switch tube S4 is connected in series with the fourth resistor R4 to form a fourth branch, one end of the pull-up resistor is connected with the power supply, the first branch, the second branch, the third branch and the fourth branch are sequentially connected in parallel between the other end of the pull-up resistor and the ground, the first switch tube S1, the second switch tube S2, the third switch tube S3 and the fourth switch tube S4 are all grounded, the fifth resistor R5 is connected in series between the first resistor R1 and the first resistor R2, the sixth resistor R6 is connected in series between the first resistor R2 and the third resistor R3, the seventh resistor R7 is connected in series between the third resistor R3 and the fourth resistor R4, and the other end of the fault signal is the pull-up voltage value;
the preset fault state signals comprise an overcurrent protection signal, an overtemperature protection signal, a short-circuit protection signal and an overvoltage protection signal, the overcurrent protection signal is used for driving the first switch tube S1 to be switched on or switched off, the overtemperature protection signal is used for driving the second switch tube S2 to be switched on or switched off, the short-circuit protection signal is used for driving the third switch tube S3 to be switched on or switched off, and the overvoltage protection signal is used for driving the fourth switch tube S4 to be switched on or switched off, wherein at most one of the first switch tube S1, the second switch tube S2, the third switch tube S3 and the fourth switch tube S4 is switched on at the same time.
In some embodiments, in step 103, the analyzing, by the controller, the failure warning signal output by each conversion circuit to determine the failed HSC chip and the failure type of the HSC chip includes:
responding to the fault alarm signal output by a certain conversion circuit being equal to one half of the power supply voltage, confirming that the HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip overcurrent protection;
responding to the fault alarm signal output by a certain conversion circuit to be equal to two thirds of power supply voltage, and determining that the HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip over-temperature protection;
responding to the fault alarm signal output by a certain conversion circuit being equal to three-fourths of power supply voltage, confirming that an HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip short-circuit protection;
responding to the fault alarm signal output by a certain conversion circuit to be equal to four fifths of power supply voltage, and confirming that the HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip overvoltage protection;
and responding to the fault alarm signal output by a certain conversion circuit being equal to zero, and determining that the HSC chip corresponding to the certain conversion circuit does not have fault.
In some embodiments, each HSC chip further has a current detection signal, the controller is further configured to acquire the current detection signal of each HSC chip, process the acquired current detection signal, and generate a current sharing alarm signal and a current sharing control signal according to a preset determination rule, and the method further includes;
and connecting the first input end of the AND gate with a starting signal, connecting the second input end of the AND gate with the current-sharing control signal, and connecting the output end of the AND gate to an enabling signal pin of each HSC chip.
In some embodiments, the processing the collected current detection signals and then generating the current sharing alarm signal and the current sharing control signal according to a preset determination rule includes:
calculating the average value of all the collected current detection signals;
responding to the current detection signal of any HSC chip being less than 0.6 times of the average value or more than 1.6 of the average value, outputting a current-sharing alarm signal needing to be alarmed and outputting a current-sharing control signal for turning off the HSC chip;
responding to the current detection signals of all HSC chips which are more than or equal to 0.6 time of the mean value and less than or equal to 1.6 times of the mean value, and the current detection signal of any HSC chip which is more than or equal to 0.6 time of the mean value and less than 0.9 time of the mean value, outputting a current-sharing alarm signal needing to be alarmed and outputting a current-sharing control signal which does not turn off the HSC chips;
responding to the current detection signals of all HSC chips, wherein the current detection signals are more than or equal to 0.6 time of the mean value and less than or equal to 1.6 times of the mean value, and the current detection signal of any HSC chip is more than 1.1 times of the mean value and less than or equal to 1.6 times of the mean value, outputting a current-sharing alarm signal needing to be alarmed and outputting a current-sharing control signal which does not turn off the HSC chip;
and responding to the current detection signals of all HSC chips which are more than or equal to 0.9 time of the mean value and less than or equal to 1.1 time of the mean value, outputting a current-sharing alarm signal which does not need to be alarmed, and outputting a current-sharing control signal which does not turn off the HSC chips.
In some embodiments, the controller is a baseboard management controller.
According to another aspect of the present invention, a computer device is provided, which may be a server, and an internal structure thereof is shown in fig. 6. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data. The network interface of the computer device is used for communicating with an external terminal through a network connection. When executed by a processor, the computer program implements the method for processing faults of the parallel HSC chips, and specifically, the method comprises the following steps:
multiple HSC chips connected in parallel, wherein each HSC chip is provided with a plurality of preset fault state signals;
converting a plurality of preset fault state signals into fault alarm signals by using a conversion circuit corresponding to each HSC chip and outputting the fault alarm signals;
the controller analyzes the fault alarm signal output by each conversion circuit to determine the HSC chip with fault and the fault type of the HSC chip.
According to still another aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method for processing a fault of a parallel HSC chip as described above, specifically comprising performing the steps of:
multiple HSC chips connected in parallel, wherein each HSC chip is provided with a plurality of preset fault state signals;
converting a plurality of preset fault state signals into fault alarm signals by using a conversion circuit corresponding to each HSC chip and outputting the fault alarm signals;
the controller analyzes the fault alarm signal output by each conversion circuit to determine the HSC chip with fault and the fault type of the HSC chip.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (16)

1. A parallel HSC chip failure handling device, the device comprising:
multiple HSC chips connected in parallel, wherein each HSC chip is provided with a plurality of preset fault state signals;
the switching circuit is corresponding to each HSC chip and is used for converting a plurality of preset fault state signals into fault alarm signals and outputting the fault alarm signals;
and the controller analyzes the fault alarm signal output by each conversion circuit to determine the HSC chip with the fault and the fault type of the HSC chip.
2. The parallel HSC chip failure handling device of claim 1, wherein the switching circuit comprises: the circuit comprises a first switch tube, a second switch tube, a third switch tube, a fourth switch tube, a first resistor, a second resistor, a third resistor, a fourth resistor and a pull-up resistor, wherein the first resistor, the second resistor, the third resistor and the fourth resistor are different in resistance;
the first switch tube is connected with the first resistor in series to form a first branch circuit, the second switch tube is connected with the second resistor in series to form a second branch circuit, the third switch tube is connected with the third resistor in series to form a third branch circuit, the fourth switch tube is connected with the fourth resistor in series to form a fourth branch circuit, one end of the pull-up resistor is connected with a power supply, the first branch circuit, the second branch circuit, the third branch circuit and the fourth branch circuit are sequentially connected between the other end of the pull-up resistor and the ground in parallel, the first switch tube, the second switch tube, the third switch tube and the fourth switch tube are all grounded, and the fault warning signal is a voltage value at the other end of the pull-up resistor;
the preset fault state signals comprise an overcurrent protection signal, an overtemperature protection signal, a short-circuit protection signal and an overvoltage protection signal, the overcurrent protection signal is used for driving the first switch tube to be switched on or switched off, the overtemperature protection signal is used for driving the second switch tube to be switched on or switched off, the short-circuit protection signal is used for driving the third switch tube to be switched on or switched off, the overvoltage protection signal is used for driving the fourth switch tube to be switched on or switched off, and at most one of the first switch tube, the second switch tube, the third switch tube and the fourth switch tube is switched on at the same time.
3. The parallel HSC chip failure handling device of claim 1, wherein the switching circuit comprises: the resistor comprises a first switching tube, a second switching tube, a third switching tube, a fourth switching tube, a first resistor, a second resistor, a third resistor, a fourth resistor, a fifth resistor, a sixth resistor, a seventh resistor and a pull-up resistor, wherein the first resistor, the second resistor, the third resistor, the fourth resistor, the fifth resistor, the sixth resistor, the seventh resistor and the pull-up resistor are all the same in resistance;
the first switch tube is connected with the first resistor in series to form a first branch circuit, the second switch tube is connected with the second resistor in series to form a second branch circuit, the third switch tube is connected with the third resistor in series to form a third branch circuit, the fourth switch tube is connected with the fourth resistor in series to form a fourth branch circuit, one end of the pull-up resistor is connected with a power supply, the first branch circuit, the second branch circuit, the third branch circuit and the fourth branch circuit are sequentially connected between the other end of the pull-up resistor and the ground in parallel, the first switch tube, the second switch tube, the third switch tube and the fourth switch tube are all grounded, the fifth resistor is connected between the first resistor and the second resistor in series, the sixth resistor is connected between the second resistor and the third resistor in series, the seventh resistor is connected between the third resistor and the fourth resistor in series, and the fault alarm signal is a voltage value at the other end of the pull-up resistor;
the preset fault state signals comprise an overcurrent protection signal, an overtemperature protection signal, a short-circuit protection signal and an overvoltage protection signal, the overcurrent protection signal is used for driving the first switch tube to be switched on or switched off, the overtemperature protection signal is used for driving the second switch tube to be switched on or switched off, the short-circuit protection signal is used for driving the third switch tube to be switched on or switched off, the overvoltage protection signal is used for driving the fourth switch tube to be switched on or switched off, and at most one of the first switch tube, the second switch tube, the third switch tube and the fourth switch tube is switched on at the same moment.
4. The parallel HSC chip failure handling device of claim 3, wherein the controller is further configured to:
responding to the fault alarm signal output by a certain conversion circuit being equal to one half of the power supply voltage, confirming that the HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip overcurrent protection;
responding to the fault alarm signal output by a certain conversion circuit to be equal to two thirds of power supply voltage, and determining that the HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip over-temperature protection;
responding to the fault alarm signal output by a certain conversion circuit being equal to three-fourths of power supply voltage, confirming that an HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip short-circuit protection;
responding to the fault alarm signal output by a certain conversion circuit to be equal to four fifths of power supply voltage, and confirming that the HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip overvoltage protection;
and responding to the fault alarm signal output by a certain conversion circuit being equal to zero, and determining that the HSC chip corresponding to the certain conversion circuit does not have fault.
5. The device of claim 1, wherein each HSC chip further has a current detection signal, the controller is further configured to collect the current detection signal of each HSC chip, and generate a current sharing alarm signal and a current sharing control signal according to a preset determination rule after processing the collected current detection signal;
the device further comprises an AND gate, wherein a first input end of the AND gate is connected with a starting signal, a second input end of the AND gate is connected with the current sharing control signal, and an output end of the AND gate is connected to an enabling signal pin of each HSC chip.
6. The parallel HSC chip failure handling device of claim 5, wherein the controller is further configured to:
calculating the average value of all the collected current detection signals;
responding to the current detection signal of any HSC chip being less than 0.6 times of the average value or more than 1.6 of the average value, outputting a current-sharing alarm signal needing to be alarmed and outputting a current-sharing control signal for turning off the HSC chip;
responding to the current detection signals of all HSC chips, if the current detection signals are more than or equal to 0.6 time of the mean value and less than or equal to 1.6 times of the mean value, and the current detection signal of any HSC chip is more than or equal to 0.6 time of the mean value and less than 0.9 time of the mean value, outputting a current-sharing alarm signal needing to be alarmed and outputting a current-sharing control signal which does not turn off the HSC chips;
responding to that the current detection signals of all HSC chips are more than or equal to 0.6 time of the mean value and less than or equal to 1.6 times of the mean value, and the current detection signal of any HSC chip is more than 1.1 times of the mean value and less than or equal to 1.6 times of the mean value, outputting a current-sharing alarm signal needing to be alarmed and outputting a current-sharing control signal not turning off the HSC chips;
and responding to the current detection signals of all HSC chips which are more than or equal to 0.9 time of the mean value and less than or equal to 1.1 time of the mean value, outputting a current-sharing alarm signal which does not need to be alarmed, and outputting a current-sharing control signal which does not turn off the HSC chips.
7. The device for processing the fault of the parallel HSC chips as claimed in any one of claims 1-6, wherein the controller is a baseboard management controller.
8. A parallel HSC chip fault handling method is characterized by comprising the following steps:
multiple HSC chips connected in parallel, wherein each HSC chip is provided with a plurality of preset fault state signals;
converting a plurality of preset fault state signals into fault alarm signals by using a conversion circuit corresponding to each HSC chip and outputting the fault alarm signals;
the controller analyzes the fault alarm signal output by each conversion circuit to determine the HSC chip with fault and the fault type of the HSC chip.
9. The parallel HSC chip failure handling method of claim 8, wherein the switching circuit comprises: the circuit comprises a first switch tube, a second switch tube, a third switch tube, a fourth switch tube, a first resistor, a second resistor, a third resistor, a fourth resistor and a pull-up resistor, wherein the first resistor, the second resistor, the third resistor and the fourth resistor are different in resistance;
the first switch tube is connected with the first resistor in series to form a first branch circuit, the second switch tube is connected with the second resistor in series to form a second branch circuit, the third switch tube is connected with the third resistor in series to form a third branch circuit, the fourth switch tube is connected with the fourth resistor in series to form a fourth branch circuit, one end of the pull-up resistor is connected with a power supply, the first branch circuit, the second branch circuit, the third branch circuit and the fourth branch circuit are sequentially connected between the other end of the pull-up resistor and the ground in parallel, the first switch tube, the second switch tube, the third switch tube and the fourth switch tube are all grounded, and the fault warning signal is a voltage value at the other end of the pull-up resistor;
the preset fault state signals comprise an overcurrent protection signal, an overtemperature protection signal, a short-circuit protection signal and an overvoltage protection signal, the overcurrent protection signal is used for driving the first switch tube to be switched on or switched off, the overtemperature protection signal is used for driving the second switch tube to be switched on or switched off, the short-circuit protection signal is used for driving the third switch tube to be switched on or switched off, the overvoltage protection signal is used for driving the fourth switch tube to be switched on or switched off, and at most one of the first switch tube, the second switch tube, the third switch tube and the fourth switch tube is switched on at the same time.
10. The parallel HSC chip failure handling method of claim 9, wherein the switching circuit comprises: the pull-up resistor comprises a first switching tube, a second switching tube, a third switching tube, a fourth switching tube, a first resistor, a second resistor, a third resistor, a fourth resistor, a fifth resistor, a sixth resistor, a seventh resistor and a pull-up resistor, wherein the first resistor, the second resistor, the third resistor, the fourth resistor, the fifth resistor, the sixth resistor, the seventh resistor and the pull-up resistor are all the same in resistance;
the first switch tube is connected with the first resistor in series to form a first branch circuit, the second switch tube is connected with the second resistor in series to form a second branch circuit, the third switch tube is connected with the third resistor in series to form a third branch circuit, the fourth switch tube is connected with the fourth resistor in series to form a fourth branch circuit, one end of the pull-up resistor is connected with a power supply, the first branch circuit, the second branch circuit, the third branch circuit and the fourth branch circuit are sequentially connected between the other end of the pull-up resistor and the ground in parallel, the first switch tube, the second switch tube, the third switch tube and the fourth switch tube are all grounded, the fifth resistor is connected between the first resistor and the second resistor in series, the sixth resistor is connected between the second resistor and the third resistor in series, the seventh resistor is connected between the third resistor and the fourth resistor in series, and the fault alarm signal is a voltage value at the other end of the pull-up resistor;
the preset fault state signals comprise an overcurrent protection signal, an overtemperature protection signal, a short-circuit protection signal and an overvoltage protection signal, the overcurrent protection signal is used for driving the first switch tube to be switched on or switched off, the overtemperature protection signal is used for driving the second switch tube to be switched on or switched off, the short-circuit protection signal is used for driving the third switch tube to be switched on or switched off, the overvoltage protection signal is used for driving the fourth switch tube to be switched on or switched off, and at most one of the first switch tube, the second switch tube, the third switch tube and the fourth switch tube is switched on at the same moment.
11. The method of claim 10, wherein the analyzing, by the controller, the fault warning signal output by each switching circuit to determine the type of the fault of the failed HSC chip and HSC chip comprises:
responding to the fault alarm signal output by a certain conversion circuit being equal to one half of the power supply voltage, confirming that the HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip overcurrent protection;
responding to the fault alarm signal output by a certain conversion circuit to be equal to two thirds of power supply voltage, and determining that the HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip over-temperature protection;
responding to the fault alarm signal output by a certain conversion circuit being equal to three quarters of power supply voltage, confirming that an HSC chip corresponding to the certain conversion circuit has a fault, and the fault type is HSC chip short-circuit protection;
responding to the fault alarm signal output by a certain conversion circuit to be equal to four fifths of power supply voltage, and confirming that the HSC chip corresponding to the certain conversion circuit has a fault, wherein the fault type is HSC chip overvoltage protection;
and responding to the fault alarm signal output by a certain conversion circuit being equal to zero, and determining that the HSC chip corresponding to the certain conversion circuit is not in fault.
12. The method of claim 8, wherein each HSC chip further has a current detection signal, the controller is further configured to collect the current detection signal of each HSC chip, and generate a current sharing alarm signal and a current sharing control signal according to a preset determination rule after processing the collected current detection signal, the method further comprising;
and connecting the first input end of the AND gate with a starting signal, connecting the second input end of the AND gate with the current-sharing control signal, and connecting the output end of the AND gate to an enabling signal pin of each HSC chip.
13. The method of claim 12, wherein the processing the collected current detection signals and then generating the current sharing alarm signal and the current sharing control signal according to the preset determination rule comprises:
calculating the average value of all the collected current detection signals;
responding to the current detection signal of any HSC chip being less than 0.6 times of the average value or more than 1.6 of the average value, outputting a current-sharing alarm signal needing to be alarmed and outputting a current-sharing control signal for turning off the HSC chip;
responding to the current detection signals of all HSC chips which are more than or equal to 0.6 time of the mean value and less than or equal to 1.6 times of the mean value, and the current detection signal of any HSC chip which is more than or equal to 0.6 time of the mean value and less than 0.9 time of the mean value, outputting a current-sharing alarm signal needing to be alarmed and outputting a current-sharing control signal which does not turn off the HSC chips;
responding to the current detection signals of all HSC chips, wherein the current detection signals are more than or equal to 0.6 time of the mean value and less than or equal to 1.6 times of the mean value, and the current detection signal of any HSC chip is more than 1.1 times of the mean value and less than or equal to 1.6 times of the mean value, outputting a current-sharing alarm signal needing to be alarmed and outputting a current-sharing control signal which does not turn off the HSC chip;
and responding to the current detection signals of all HSC chips which are more than or equal to 0.9 time of the mean value and less than or equal to 1.1 time of the mean value, outputting a current-sharing alarm signal which does not need to be alarmed, and outputting a current-sharing control signal which does not turn off the HSC chips.
14. The method for processing the fault of the parallel HSC chips as claimed in any one of the claims 8-13, wherein the controller is a baseboard management controller.
15. A computer device, comprising:
at least one processor; and
a memory storing a computer program operable in the processor, the processor when executing the program performing the method of any of claims 8-13.
16. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the method of any one of claims 8-13.
CN202211404113.9A 2022-11-10 2022-11-10 Parallel HSC chip fault processing device, method, equipment and medium Active CN115459204B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211404113.9A CN115459204B (en) 2022-11-10 2022-11-10 Parallel HSC chip fault processing device, method, equipment and medium
PCT/CN2023/100842 WO2024098750A1 (en) 2022-11-10 2023-06-16 Parallel hot swap controller chip fault processing apparatus and method, and device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211404113.9A CN115459204B (en) 2022-11-10 2022-11-10 Parallel HSC chip fault processing device, method, equipment and medium

Publications (2)

Publication Number Publication Date
CN115459204A true CN115459204A (en) 2022-12-09
CN115459204B CN115459204B (en) 2023-02-28

Family

ID=84295572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211404113.9A Active CN115459204B (en) 2022-11-10 2022-11-10 Parallel HSC chip fault processing device, method, equipment and medium

Country Status (2)

Country Link
CN (1) CN115459204B (en)
WO (1) WO2024098750A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115808640A (en) * 2023-02-09 2023-03-17 苏州浪潮智能科技有限公司 Power failure detection circuit, method, system, electronic device, and storage medium
CN116298635A (en) * 2023-03-30 2023-06-23 海信家电集团股份有限公司 IPM fault detection system, IPM fault detection method, IPM fault detection device and storage medium
WO2024098750A1 (en) * 2022-11-10 2024-05-16 苏州元脑智能科技有限公司 Parallel hot swap controller chip fault processing apparatus and method, and device and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103227583A (en) * 2013-04-26 2013-07-31 江苏省电力设计院 Hot plug type conversion system for new energy and energy storage system
CN105656183A (en) * 2014-11-15 2016-06-08 北京航天万源科技公司 Modularized intelligent power supply and distribution device
CN108880230A (en) * 2018-06-28 2018-11-23 烽火通信科技股份有限公司 Power sources in parallel control module and parallel system based on Switching Power Supply chopping voltage
CN112069104A (en) * 2020-05-28 2020-12-11 苏州浪潮智能科技有限公司 Chip hot-plug protection circuit

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2872089B2 (en) * 1995-10-27 1999-03-17 群馬日本電気株式会社 Hot-swap device
US8847438B2 (en) * 2008-07-14 2014-09-30 Texas Instruments Incorporated Minimum loss and wiring circuit and method for paralleling hot swap controllers
US9917437B2 (en) * 2015-05-06 2018-03-13 Cisco Technology, Inc. Hot swap controller with individually controlled parallel current paths
CN110377138A (en) * 2019-06-29 2019-10-25 苏州浪潮智能科技有限公司 A kind of multipath server power supply circuit and method for controlling power supply
CN213072104U (en) * 2020-08-28 2021-04-27 苏州浪潮智能科技有限公司 Circuit for current sharing, chip for current sharing and circuit for hot plug current sharing control
CN112181123B (en) * 2020-09-28 2022-07-12 苏州浪潮智能科技有限公司 Device and method for improving output current sharing of hot-plug chip
CN115459204B (en) * 2022-11-10 2023-02-28 苏州浪潮智能科技有限公司 Parallel HSC chip fault processing device, method, equipment and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103227583A (en) * 2013-04-26 2013-07-31 江苏省电力设计院 Hot plug type conversion system for new energy and energy storage system
CN105656183A (en) * 2014-11-15 2016-06-08 北京航天万源科技公司 Modularized intelligent power supply and distribution device
CN108880230A (en) * 2018-06-28 2018-11-23 烽火通信科技股份有限公司 Power sources in parallel control module and parallel system based on Switching Power Supply chopping voltage
CN112069104A (en) * 2020-05-28 2020-12-11 苏州浪潮智能科技有限公司 Chip hot-plug protection circuit

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024098750A1 (en) * 2022-11-10 2024-05-16 苏州元脑智能科技有限公司 Parallel hot swap controller chip fault processing apparatus and method, and device and medium
CN115808640A (en) * 2023-02-09 2023-03-17 苏州浪潮智能科技有限公司 Power failure detection circuit, method, system, electronic device, and storage medium
CN115808640B (en) * 2023-02-09 2023-05-16 苏州浪潮智能科技有限公司 Power failure detection circuit, method, system, electronic device and storage medium
CN116298635A (en) * 2023-03-30 2023-06-23 海信家电集团股份有限公司 IPM fault detection system, IPM fault detection method, IPM fault detection device and storage medium

Also Published As

Publication number Publication date
WO2024098750A1 (en) 2024-05-16
CN115459204B (en) 2023-02-28

Similar Documents

Publication Publication Date Title
CN115459204B (en) Parallel HSC chip fault processing device, method, equipment and medium
CN110045182B (en) Low-voltage transformer area power supply loop abnormity analysis method based on impedance calculation
EP2981902B1 (en) Automatic configuration of alarm aggregations
US8054599B2 (en) Apparatus, system, and method for detecting a power system component failure
JP2019537701A (en) Method and apparatus for detecting failure of distribution network with high reliability, and storage medium
CN116500487B (en) Fault detection system and method for switching power supply, terminal equipment and medium
JP2015018838A (en) Fault detector of backflow prevention diode for solar cell, fault detection system of backflow prevention diode for solar cell, and fault detection method of backflow prevention diode for solar cell
CN113595246B (en) Micro-grid state online monitoring method and device, computer equipment and storage medium
CN112054484B (en) High-reliability multi-phase power supply system and method
CN115808640B (en) Power failure detection circuit, method, system, electronic device and storage medium
CN112838668B (en) Circuit breaker identification method, device and equipment
CN106647958A (en) Server rack
CN113238172B (en) Current transformer neutral wire abnormity judgment method based on neutral wire resistance
CN210350849U (en) Security protection power supply is equipped with electric circuit and power panel
CN210604876U (en) Fault detection circuit and equipment
CN112001588A (en) Accident event online pre-judging method and device based on N-1 state
CN112398226A (en) Power supply system electricity stealing prevention method, system, terminal and storage medium
CN111865700A (en) Information node screening method and related device for electric power information physical system
CN220913297U (en) Blade board installation detection circuit, blade server and detection equipment
CN116626540B (en) Method, system, terminal and storage medium for judging broken line fault interval
CN218445813U (en) Mainboard detection device and microelectronic equipment
TWI754941B (en) Electronic equipment and operation method thereof
CN216599106U (en) Power supply device and power supply system
CN114356617B (en) Error injection testing method, device, system and computing equipment
CN113253153B (en) Neutral line abnormity judgment method based on non-fault phase second harmonic component ratio

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant