TW201706844A - Power failure detection system and method thereof - Google Patents

Power failure detection system and method thereof Download PDF

Info

Publication number
TW201706844A
TW201706844A TW104125304A TW104125304A TW201706844A TW 201706844 A TW201706844 A TW 201706844A TW 104125304 A TW104125304 A TW 104125304A TW 104125304 A TW104125304 A TW 104125304A TW 201706844 A TW201706844 A TW 201706844A
Authority
TW
Taiwan
Prior art keywords
power failure
power
power supply
processing unit
central processing
Prior art date
Application number
TW104125304A
Other languages
Chinese (zh)
Other versions
TWI584114B (en
Inventor
黃建新
韓應賢
Original Assignee
英業達股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 英業達股份有限公司 filed Critical 英業達股份有限公司
Priority to TW104125304A priority Critical patent/TWI584114B/en
Publication of TW201706844A publication Critical patent/TW201706844A/en
Application granted granted Critical
Publication of TWI584114B publication Critical patent/TWI584114B/en

Links

Abstract

A power failure monitoring system and a method thereof are disclosed herein, where the power failure monitoring system includes a motherboard, a board, a complex programmable logic device (CPLD) and a baseboard management controller (BMC) module, where the motherboard includes a central processing unit (CPU) power and a non-CPU power, the board includes a board power, and the BMC module includes a register that is electrically coupled to the CPLD. The CPLD is configured to execute a shutdown process when power failure happens, identify a power failure type, and determine whether to execute a restart process according to the power failure type. If the restart process is executed and a time of the restart process exceeds a predetermined time, the CPLD records a lock message in the register. The BMC module is configured to record the time of the restart process, and execute a lock process according to the lock message.

Description

電源失效偵測系統與其方法Power failure detection system and method thereof

本發明是一種偵測技術,且特別是有關於一種電源失效偵測系統與其方法。The invention is a detection technology, and in particular relates to a power failure detection system and a method thereof.

當伺服器(Server)內部電源出現問題(例如無法開機上電、開機掉電、按下開關按鈕風扇轉動後掉電等現象)的時候,以往需要透過使用示波器(Oscilloscope)或者三用電表(Multimeter)依序量測開機時序(Power-on sequence)的電源良好(Power good)訊號以及致能(Enable)訊號等相關訊號,以判斷電源問題發生在哪一個電源而導致無法開機。When there is a problem with the internal power supply of the server (for example, the power cannot be turned on, the power is turned off, the power is turned off after the switch button is pressed, etc.), it is necessary to use an oscilloscope (Oscilloscope) or a three-meter ( Multimeter) sequentially measures the Power-on sequence and the Enable signal and other related signals to determine which power supply is causing the power problem and cannot be turned on.

然而,當伺服器組成系統之後,由於主機已經安裝於機箱內部,若發生上述問題則難以利用示波器或者三用電表量測開機時序訊號,因此無法在短時間之內判斷電源問題發生在哪一個電源,或者電源問題是否為假性的電源故障(Power fault),例如由於其他的系統配置(Configuration)所造成的錯誤。However, after the server is formed into a system, since the host is already installed inside the chassis, if the above problem occurs, it is difficult to measure the boot timing signal by using an oscilloscope or a three-meter, so that it is impossible to judge which power problem occurs in a short time. Whether the power supply, or power supply problem is a false power fault, such as an error caused by other system configurations.

為了識別伺服器系統發生的電源問題,並且即時處理以排除假性的電源故障造成的無法開機,本揭示內容之一態樣是提供一種電源失效偵測系統,其包含主機板(Motherboard)、板卡、複雜型可程式化邏輯裝置(Complex Programmable Logic Device,CPLD)與基板管理控制器(Baseboard Management Controller,BMC)模組,其中主機板包含中央處理單元(Central Processing Unit,CPU)電源與非中央處理單元電源,板卡包含板卡電源,基板管理控制器模組包含暫存器(Register),暫存器電性耦接複雜型可程式化邏輯裝置。複雜型可程式化邏輯裝置用以當發生電源失效時,執行關機(Shutdown)程序;識別電源失效類型,並根據電源失效類型判斷是否執行重啟(Restart)程序,電源失效類型表示電源失效發生在中央處理單元電源、非中央處理單元電源或板卡電源;若執行重啟程序,當重啟程序之次數達到預設次數時,記錄鎖定訊息於暫存器中。基板管理控制器模組用以記錄該重啟程序之次數,根據鎖定訊息執行鎖定程序。In order to identify the power problem that occurs in the server system and to process it in time to eliminate the inability to power on due to a false power failure, one aspect of the present disclosure is to provide a power failure detection system including a motherboard (Motherboard) and a board. Card, Complex Programmable Logic Device (CPLD) and Baseboard Management Controller (BMC) modules, wherein the motherboard includes a central processing unit (CPU) power supply and decentralized The processing unit power supply, the board includes the board power supply, and the base management controller module includes a register, and the temporary register is electrically coupled with the complex programmable logic device. The complex programmable logic device is configured to perform a Shutdown program when a power failure occurs; identify a power failure type, and determine whether to perform a restart procedure according to the power failure type, and the power failure type indicates that the power failure occurs in the center. Processing unit power, non-central processing unit power or board power; if the restart procedure is executed, when the number of restarts reaches a preset number, the lock message is recorded in the register. The baseboard management controller module is configured to record the number of times of the restarting process, and execute the locking procedure according to the lock message.

本揭示內容之一實施例中,其中當複雜型可程式化邏輯裝置記錄鎖定訊息於暫存器中時,傳送暫存器之位置訊息至基板管理控制器模組,基板管理控制器模組根據位置訊息讀取暫存器之鎖定訊息。In an embodiment of the present disclosure, when the complex programmable logic device records the lock message in the register, the location information of the buffer is transferred to the substrate management controller module, and the substrate management controller module is The location message reads the lock message of the scratchpad.

本揭示內容之一實施例中,其中當電源失效類型表示電源失效發生在中央處理單元電源時,複雜型可程式化邏輯裝置記錄該鎖定訊息於暫存器中。In one embodiment of the present disclosure, the complex programmable logic device records the lock message in the register when the power failure type indicates that a power failure occurred at the central processing unit power supply.

本揭示內容之一實施例中,其中當電源失效類型表示電源失效發生在板卡電源或非中央處理單元電源時,複雜型可程式化邏輯裝置執行重啟程序。In one embodiment of the present disclosure, the complex programmable logic device performs a restart procedure when the power failure type indicates that a power failure occurs at the power of the board or the power of the non-central processing unit.

本揭示內容之一實施例中,其中複雜型可程式化邏輯裝置用以經由電源掃描程序記錄電源失效發生在中央處理單元電源、非中央處理單元電源或板卡電源內部之電源失效位置。In one embodiment of the present disclosure, the complex programmable logic device is configured to record, via the power scan program, that a power failure occurs at a power failure location within the central processing unit power supply, the non-central processing unit power supply, or the board power supply.

本發明之另一態樣是提供一種電源失效偵測方法,其包含以下步驟:當發生電源失效時,執行關機程序;識別電源失效類型,並根據電源失效類型判斷是否執行重啟程序,其中電源失效類型表示電源失效發生在主機板之中央處理單元電源、主機板之非中央處理單元電源或板卡電源;若執行重啟程序,當重啟程序之次數達到預設次數時,記錄鎖定訊息於暫存器中,並透過基板管理控制器模組記錄重啟程序之次數;透過基板管理控制器模組根據該鎖定訊息執行鎖定程序。Another aspect of the present invention provides a power failure detection method, comprising the steps of: performing a shutdown procedure when a power failure occurs; identifying a power failure type, and determining whether to execute a restart procedure according to a power failure type, wherein the power failure occurs The type indicates that the power failure occurs in the central processing unit power of the motherboard, the non-central processing unit power of the motherboard, or the power of the board; if the restart procedure is executed, when the number of restarts reaches a preset number, the lock message is recorded in the register. And recording the number of restarting procedures through the baseboard management controller module; and performing a locking procedure according to the locked message by the baseboard management controller module.

本揭示內容之一實施例中,更包含:當記錄鎖定訊息於暫存器中時,傳送暫存器之位置訊息至基板管理控制器模組,基板管理控制器模組根據位置訊息讀取暫存器之鎖定訊息。In an embodiment of the present disclosure, the method further includes: when the record lock message is in the temporary register, transmitting the location information of the temporary register to the substrate management controller module, and the substrate management controller module reads the temporary information according to the location information. The lock message of the memory.

本揭示內容之一實施例中,更包含:當電源失效類型表示電源失效發生在中央處理單元電源時,記錄鎖定訊息於暫存器中。In an embodiment of the present disclosure, the method further includes: recording a lock message in the temporary register when the power failure type indicates that the power failure occurs in the central processing unit power supply.

本揭示內容之一實施例中,更包含:當電源失效類型表示電源失效發生在板卡電源或非中央處理單元電源時,執行重啟程序。In an embodiment of the present disclosure, the method further includes: performing a restart procedure when the power failure type indicates that the power failure occurs in the power supply of the card or the power of the non-central processing unit.

本揭示內容之一實施例中,更包含:經由電源掃描程序記錄電源失效發生在中央處理單元電源、非中央處理單元電源或板卡電源內部之電源失效位置。In an embodiment of the present disclosure, the method further includes: recording, by the power supply scanning program, a power failure occurs in a power failure position of the central processing unit power supply, the non-central processing unit power supply, or the board power supply.

綜上所述,本揭示內容可識別伺服器系統發生電源失效的電源,而無須逐一量測電源的開機時序,並且根據上述失效電源的不同進行相對應的後續處理,以排除假性的電源故障事件,亦即確認真正的電源故障事件。進一步減少因假性的電源故障事件造成的維修成本。另一方面,若無法透過重啟程序排除電源失效,則可透過電源失效的相關資訊以改善伺服器系統電源失效的錯誤分析與故障維修過程。In summary, the disclosure can identify the power supply failure of the server system, and does not need to measure the power-on sequence of the power supply one by one, and perform corresponding subsequent processing according to the different failed power supplies to eliminate false power failures. An event, that is, a confirmation of a true power failure event. Further reduce maintenance costs due to false power failure events. On the other hand, if the power failure cannot be eliminated through the restart procedure, the information about the power failure can be used to improve the error analysis and fault repair process of the server system power failure.

以下將以實施方式對上述之說明作詳細的描述,並對本揭示內容之技術方案提供更進一步的解釋。The above description will be described in detail in the following embodiments, and further explanation of the technical solutions of the present disclosure is provided.

為了使本揭示內容之敘述更加詳盡與完備,可參照附圖及以下所述之各種實施例。但所提供之實施例並非用以限制本發明所涵蓋的範圍;步驟的描述亦非用以限制其執行之順序,任何由重新組合,所產生具有均等功效的裝置,皆為本發明所涵蓋的範圍。To make the description of the present disclosure more detailed and complete, reference is made to the drawings and the various embodiments described below. The examples are not intended to limit the scope of the invention; the description of the steps is not intended to limit the order of execution thereof, and any device having equal efficiency resulting from recombination is covered by the present invention. range.

於實施方式與申請專利範圍中,除非內文中對於冠詞有所特別限定,否則「一」與「該」可泛指單一個或複數個。In the scope of the embodiments and claims, "one" and "the" may mean a single or plural unless the context specifically dictates the articles.

另外,關於本文中所使用之「耦接」及「連接」,均可指二或多個元件相互直接作實體接觸或電性接觸,或是相互間接作實體接觸或電性接觸,而「耦接」還可指二或多個元件相互操作或動作。In addition, as used herein, "coupled" and "connected" may mean that two or more elements are in direct physical or electrical contact with each other, or indirectly in physical or electrical contact with each other. "Connected" may also mean that two or more elements operate or interact with each other.

如第1圖所繪示,第1圖為本揭示內容一實施例之電源失效偵測系統100示意圖。電源失效偵測系統100包含主機板(Motherboard)140、板卡150、複雜型可程式化邏輯裝置110(Complex Programmable Logic Device,CPLD)與基板管理控制器(Baseboard Management Controller,BMC)模組130,其中主機板140包含中央處理單元(Central Processing Unit,CPU)電源142與非中央處理單元電源144,板卡150包含板卡電源152,基板管理控制器模組130包含暫存器120(Register),暫存器120電性耦接複雜型可程式化邏輯裝置110。As shown in FIG. 1 , FIG. 1 is a schematic diagram of a power failure detection system 100 according to an embodiment of the disclosure. The power failure detection system 100 includes a Motherboard 140, a board 150, a Complex Programmable Logic Device (CPLD), and a Baseboard Management Controller (BMC) module 130. The motherboard 140 includes a central processing unit (CPU) power supply 142 and a non-central processing unit power supply 144. The board 150 includes a board power supply 152. The baseboard management controller module 130 includes a register 120 (Register). The register 120 is electrically coupled to the complex programmable logic device 110.

複雜型可程式化邏輯裝置110即時偵測伺服器系統內部各個電源的運作狀況。舉例而言,應用於主機板140的主電源包含P12V、P5V、P3V3、PVDDQ、PVCCIN等,而預備(Stand-by)電源包含P12V_STBY、P5V_STBY、P3V3_STBY、P1V8_STBY、P1V_STBY等。於本實施例中,主機板上應用的電源可區分為中央處理單元電源142(例如PVCCIN),以及非中央處理單元電源144(例如供應至主機板的記憶裝置的電源P12V、P5V、P3V3、PVDDQ,以及預備電源P12V_STBY、P5V_STBY、P3V3_STBY、P1V8_STBY、P1V_STBY)。板卡電源152例如供應至伺服器背板(Backboard)的電源,其包含預備電源與主電源。The complex programmable logic device 110 instantly detects the operation of various power sources within the server system. For example, the main power applied to the motherboard 140 includes P12V, P5V, P3V3, PVDDQ, PVCCIN, etc., and the standby (by-by) power supply includes P12V_STBY, P5V_STBY, P3V3_STBY, P1V8_STBY, P1V_STBY, and the like. In this embodiment, the power applied to the motherboard can be divided into a central processing unit power supply 142 (eg, PVCCIN), and a non-central processing unit power supply 144 (eg, power supplies P12V, P5V, P3V3, PVDDQ supplied to the memory of the motherboard). And the standby power supply P12V_STBY, P5V_STBY, P3V3_STBY, P1V8_STBY, P1V_STBY). The board power supply 152 is, for example, supplied to a power supply of a server backboard, which includes a backup power source and a main power source.

於一實施例中,複雜型可程式化邏輯裝置110即時偵測供應至主機板140上的中央處理單元電源142與非中央處理單元電源144,以及供應至板卡的板卡電源152是否發生電源失效。當上述任一電源發生電源失效時,複雜型可程式化邏輯裝置110立即執行關機(Shutdown)程序,識別電源失效類型。複雜型可程式化邏輯裝置110進一步根據電源失效類型判斷是否執行重啟(Restart)程序。若複雜型可程式化邏輯裝置110執行重啟程序,並且順利重新開機,表示此電源失效為假性的電源故障,無須進行維修。若複雜型可程式化邏輯裝置110無法順利開機,則反覆執行重啟程序。當重啟程序之次數達到預設次數時(例如預設次數為3次),記錄鎖定訊息於暫存器120中,以表示電源失效無法經由重新開機排除。基板管理控制器模組130用以根據鎖定訊息執行鎖定程序以鎖定伺服器系統,以避免伺服器系統在電源失效問題排除之前因誤動作造成二次損害。此外,基板管理控制器模組130亦用以記錄重啟程序之次數。因此,複雜型可程式化邏輯裝置110在伺服器系統發生電源失效時,識別電源失效類型,並可進一步根據電源失效類型判斷是否執行重啟程序以排除假性的電源故障事件,亦即確認真正的電源故障事件。上述預設次數可依實際需求設計,並非用以限定本揭示內容,其他合適的預設次數亦涵蓋在本揭示內容的範圍之內。In one embodiment, the complex programmable logic device 110 instantly detects whether the central processing unit power supply 142 and the non-central processing unit power supply 144 supplied to the motherboard 140 and the power supply to the board power supply 152 supplied to the card are powered. Invalid. When a power failure occurs in any of the above power supplies, the complex programmable logic device 110 immediately performs a Shutdown procedure to identify the type of power failure. The complex programmable logic device 110 further determines whether to execute a restart procedure based on the type of power failure. If the complex programmable logic device 110 executes the restart procedure and restarts smoothly, it indicates that the power failure is a false power failure and no maintenance is required. If the complex programmable logic device 110 fails to boot smoothly, the restart process is repeatedly executed. When the number of restarts of the program reaches a preset number of times (for example, the preset number of times is 3), a lock message is recorded in the register 120 to indicate that the power failure cannot be eliminated by rebooting. The baseboard management controller module 130 is configured to perform a locking procedure according to the lock message to lock the server system to prevent the server system from causing secondary damage due to malfunction before the power failure problem is eliminated. In addition, the substrate management controller module 130 is also used to record the number of restart procedures. Therefore, the complex programmable logic device 110 identifies the type of power failure when the server system fails, and can further determine whether to perform a restart procedure according to the type of power failure to eliminate a false power failure event, that is, to confirm the true Power failure event. The foregoing preset times may be designed according to actual needs, and are not intended to limit the disclosure, and other suitable preset times are also included in the scope of the present disclosure.

以下為根據複雜型可程式化邏輯裝置110對於不同的電源失效類型判斷是否執行重啟程序的細節描述。於一實施例中,當電源失效類型為中央處理單元電源142失效時,複雜型可程式化邏輯裝置110直接記錄鎖定訊息於暫存器120中,而不執行重啟程序。基板管理控制器模組130根據鎖定訊息執行鎖定程序以鎖定伺服器系統,以致於伺服器系統在中央處理單元電源142失效問題排除之前無法開機。由於中央處理單元電源失效屬於致命的(Fatal)電源失效,因此複雜型可程式化邏輯裝置110直接記錄鎖定訊息於暫存器120中,而不會對伺服器系統執行重啟程序。基板管理控制器模組130用以執行伺服器系統的鎖定程序,以避免誤動作造成系統的二次損害。The following is a detailed description of whether or not to execute a restart procedure for different power failure types according to the complex programmable logic device 110. In one embodiment, when the power failure type is a failure of the central processing unit power supply 142, the complex programmable logic device 110 directly records the lock message in the scratchpad 120 without performing a restart procedure. The baseboard management controller module 130 executes a locking procedure based on the lock message to lock the server system such that the server system fails to power on before the central processing unit power supply 142 failure issue is eliminated. Since the central processing unit power failure is a fatal power failure, the complex programmable logic device 110 directly records the lock message in the scratchpad 120 without performing a restart procedure on the server system. The baseboard management controller module 130 is configured to execute a locking procedure of the server system to avoid secondary damage caused by a malfunction.

或者,於另一實施例中,當電源失效類型為主機板140的電源失效,並且不是中央處理單元電源142失效(亦即非中央處理單元電源144失效)時,複雜型可程式化邏輯裝置110執行重啟程序以嘗試排除上述電源失效。若複雜型可程式化邏輯裝置110順利重新開機,表示此電源失效為假性的電源故障,無須進行維修。若複雜型可程式化邏輯裝置110無法順利開機,則反覆執行重啟程序。當重啟程序之次數達到預設次數時(例如預設次數為3次),記錄鎖定訊息於暫存器120中,表示無法經由重新開機排除電源失效的問題。上述預設次數可依實際需求設計,並非用以限定本揭示內容,其他合適的預設次數亦涵蓋在本揭示內容的範圍之內。Alternatively, in another embodiment, the complex programmable logic device 110 when the power failure type is a power failure of the motherboard 140 and not the central processing unit power supply 142 fails (ie, the non-central processing unit power supply 144 fails). Perform a restart procedure to try to troubleshoot the above power failure. If the complex programmable logic device 110 is successfully restarted, it indicates that the power failure is a false power failure and no maintenance is required. If the complex programmable logic device 110 fails to boot smoothly, the restart process is repeatedly executed. When the number of restarts of the program reaches a preset number of times (for example, the preset number of times is 3), the record lock message is recorded in the register 120, indicating that the power failure cannot be eliminated by restarting. The foregoing preset times may be designed according to actual needs, and are not intended to limit the disclosure, and other suitable preset times are also included in the scope of the present disclosure.

或者,於另一實施例中,當電源失效類型為板卡電源152失效(包含板卡的預備電源失效與板卡的主電源失效)時,複雜型可程式化邏輯裝置110執行重啟程序以嘗試排除上述電源失效。若複雜型可程式化邏輯裝置110順利重新開機,表示此電源失效為假性的電源故障,無須進行維修。若複雜型可程式化邏輯裝置110無法順利開機,則反覆執行重啟程序。當重啟程序之次數達到預設次數時(例如預設次數為3次),記錄鎖定訊息於暫存器120中,表示無法經由重新開機排除電源失效的問題。上述預設次數可依實際需求設計,並非用以限定本揭示內容,其他合適的預設次數亦涵蓋在本揭示內容的範圍之內。於一實施例中,當電源失效類型為板卡電源152失效時(包含板卡的預備電源失效與板卡的主電源失效),透過顯示裝置發出提醒。舉例而言,透過燈號裝置發出紅光,但本揭示內容不以此為限。Alternatively, in another embodiment, when the power failure type is a failure of the board power supply 152 (including the standby power failure of the board and the main power failure of the board), the complex programmable logic device 110 performs a restart procedure to try Eliminate the above power failure. If the complex programmable logic device 110 is successfully restarted, it indicates that the power failure is a false power failure and no maintenance is required. If the complex programmable logic device 110 fails to boot smoothly, the restart process is repeatedly executed. When the number of restarts of the program reaches a preset number of times (for example, the preset number of times is 3), the record lock message is recorded in the register 120, indicating that the power failure cannot be eliminated by restarting. The foregoing preset times may be designed according to actual needs, and are not intended to limit the disclosure, and other suitable preset times are also included in the scope of the present disclosure. In one embodiment, when the power failure type is that the board power supply 152 fails (including the standby power failure of the board and the main power failure of the board), an alert is sent through the display device. For example, red light is emitted through the light device, but the disclosure is not limited thereto.

於一實施例中,當複雜型可程式化邏輯裝置110記錄鎖定訊息於暫存器120中時,傳送暫存器120之位置訊息至基板管理控制器模組130。基板管理控制器模組130根據位置訊息讀取暫存器120之鎖定訊息,以執行鎖定程序。In one embodiment, when the complex programmable logic device 110 records the lock message in the register 120, the location information of the register 120 is transferred to the baseboard management controller module 130. The baseboard management controller module 130 reads the lock message of the register 120 according to the location information to execute the lock procedure.

於一實施例中,複雜型可程式化邏輯元件210透過程式碼定義電源掃描程序以記錄電源失效發生在中央處理單元電源142、非中央處理單元電源144或板卡電源152內部之電源失效位置,因此改善伺服器系統電源失效的錯誤分析與故障維修過程。舉例而言,當發生板卡電源152失效時,複雜型可程式化邏輯裝置110用以經由電源掃描程序以記錄電源失效發生於板卡上的哪一個電源。因此,若無法經由重啟程序排除電源失效問題時,可有效地節省檢查電源失效位置的時間。當中央處理單元電源142失效發生,或者非中央處理單元電源144(或板卡電源152)失效發生並且重啟程序的次數達到預設次數時,複雜型可程式化邏輯元件210記錄鎖定訊息於失效的電源的對應腳位(pin)連接的暫存器120中,並傳送上述暫存器120的位置訊息至基板管理控制器模組130。In one embodiment, the complex programmable logic component 210 defines a power scan program through the code to record a power failure occurring at a power failure location within the central processing unit power supply 142, the non-central processing unit power supply 144, or the board power supply 152. Therefore, the error analysis and fault repair process of the server system power failure are improved. For example, when the board power supply 152 fails, the complex programmable logic device 110 uses a power scanning program to record which power source on the board the power failure occurred. Therefore, if the power failure problem cannot be eliminated by the restart procedure, the time for checking the power failure position can be effectively saved. When the central processing unit power supply 142 fails, or the non-central processing unit power supply 144 (or the board power supply 152) fails and the number of restarts reaches a preset number of times, the complex programmable logic element 210 records the lock message in the failed state. The corresponding pin of the power source is connected to the register 120, and the position information of the register 120 is transmitted to the substrate management controller module 130.

第2圖係說明本揭示內容另一實施例之電源失效偵測方法200流程圖。電源失效偵測方法200包括多個步驟S202~S208,可應用於如第1圖所示的電源失效偵測系統100中,然熟習本案之技藝者應瞭解到,在本實施例中所提及的步驟,除特別敘明其順序者外,均可依實際需要調整其前後順序,甚至可同時或部分同時執行。2 is a flow chart illustrating a power failure detection method 200 in accordance with another embodiment of the present disclosure. The power failure detection method 200 includes a plurality of steps S202-S208, which can be applied to the power failure detection system 100 as shown in FIG. 1, but those skilled in the art should understand that the reference in this embodiment is mentioned. The steps can be adjusted according to actual needs, except for the order in which they are specifically stated, or even simultaneously or partially.

以下說明以伺服器系統的電源為例。即時偵測伺服器系統內部各個電源的運作狀況。舉例而言,應用於主機板的主電源包含P12V、P5V、P3V3、PVDDQ、PVCCIN等,而預備電源包含P12V_STBY、P5V_STBY、P3V3_STBY、P1V8_STBY、P1V_STBY等。於本實施例中,主機板上應用的電源可區分為中央處理單元電源(例如PVCCIN),以及非中央處理單元電源(例如供應至主機板的記憶裝置的電源P12V、P5V、P3V3、PVDDQ,以及預備電源P12V_STBY、P5V_STBY、P3V3_STBY、P1V8_STBY、P1V_STBY)。板卡電源例如供應至伺服器背板的電源,其包含預備電源與主電源。The following description uses the power supply of the server system as an example. Instantly detect the operation status of each power supply inside the server system. For example, the main power supply applied to the motherboard includes P12V, P5V, P3V3, PVDDQ, PVCCIN, etc., and the standby power supply includes P12V_STBY, P5V_STBY, P3V3_STBY, P1V8_STBY, P1V_STBY, and the like. In this embodiment, the power applied to the motherboard can be divided into a central processing unit power supply (eg, PVCCIN), and a non-central processing unit power supply (eg, power supplies P12V, P5V, P3V3, PVDDQ supplied to the memory of the motherboard), and Prepare the power supply P12V_STBY, P5V_STBY, P3V3_STBY, P1V8_STBY, P1V_STBY). The board power supply, for example, is supplied to a power supply of the server backplane, which includes a backup power source and a main power source.

於一實施例中,即時偵測供應至主機板上的中央處理單元電源與非中央處理單元電源,以及供應至板卡的電源是否發生電源失效。於步驟S202中,當上述任一電源發生電源失效時,立即執行關機程序。於步驟S204中,識別電源失效類型,並根據電源失效類型判斷是否執行重啟程序。若重啟程序執行並且順利重新開機,表示此電源失效為假性的電源故障,無須進行維修。若無法順利開機,則反覆執行重啟程序。於步驟S206中,若重啟程序執行並且當重啟程序之次數達到預設次數時(例如預設次數為3次),記錄鎖定訊息於暫存器中,以表示電源失效無法經由重新開機排除。此外,亦透過基板管理控制器模組記錄重啟程序之次數。於步驟S208中,透過基板管理控制器模組根據鎖定訊息執行鎖定程序以鎖定伺服器系統,以避免伺服器系統在電源失效問題排除之前因誤動作造成二次損害。因此,在伺服器系統發生電源失效時,識別電源失效類型,並可進一步根據電源失效類型判斷是否執行重啟程序以排除假性的電源故障事件,亦即確認真正的電源故障事件。上述預設次數可依實際需求設計,並非用以限定本揭示內容,其他合適的預設次數亦涵蓋在本揭示內容的範圍之內。In one embodiment, the central processing unit power supply to the motherboard and the power of the non-central processing unit, and the power supply to the board, are detected immediately. In step S202, when any of the above power sources fails, the shutdown procedure is immediately executed. In step S204, the power failure type is identified, and whether the restart procedure is executed is determined according to the power failure type. If the restart program is executed and the power is turned back on, it indicates that the power failure is a false power failure and no maintenance is required. If the boot fails, the restart procedure will be executed repeatedly. In step S206, if the restarting of the program is performed and when the number of restarting the program reaches a preset number of times (for example, the preset number of times is 3), the lock information is recorded in the register to indicate that the power failure cannot be eliminated by restarting. In addition, the number of restart procedures is also recorded through the baseboard management controller module. In step S208, the substrate management controller module executes a locking procedure according to the lock message to lock the server system to prevent the server system from causing secondary damage due to malfunction before the power failure problem is eliminated. Therefore, when a power failure occurs in the server system, the type of power failure is identified, and it is further determined whether to perform a restart procedure according to the type of power failure to eliminate a false power failure event, that is, to confirm a true power failure event. The foregoing preset times may be designed according to actual needs, and are not intended to limit the disclosure, and other suitable preset times are also included in the scope of the present disclosure.

以下為根據不同的電源失效類型判斷是否執行重啟程序的細節描述。於一實施例中,當電源失效類型為中央處理單元電源失效時,直接記錄鎖定訊息於暫存器中,而不執行重啟程序。透過基板管理控制器模組根據鎖定訊息執行鎖定程序以鎖定伺服器系統,以致於伺服器系統在中央處理單元電源失效問題排除之前無法開機。由於中央處理單元電源失效屬於致命的電源失效,因此直接記錄鎖定訊息於暫存器中,而不會對伺服器系統執行重啟程序。透過基板管理控制器模組執行伺服器系統的鎖定程序,以避免誤動作造成系統的二次損害。The following is a detailed description of whether to perform a restart procedure based on different power failure types. In one embodiment, when the power failure type is a failure of the central processing unit power supply, the lock information is directly recorded in the temporary memory without performing a restart procedure. The locking process is executed by the baseboard management controller module according to the lock message to lock the server system, so that the server system cannot be turned on before the central processing unit power failure problem is eliminated. Since the central processing unit power failure is a fatal power failure, the lock message is directly recorded in the scratchpad without performing a restart procedure on the server system. The locking program of the server system is executed through the baseboard management controller module to avoid secondary damage caused by malfunction.

或者,於另一實施例中,當電源失效類型為主機板電源失效,並且不是中央處理單元電源失效(亦即非中央處理單元電源失效)時,執行重啟程序以嘗試排除上述電源失效。若順利重新開機,表示此電源失效為假性的電源故障,無須進行維修。若無法順利開機,則反覆執行重啟程序。當重啟程序之次數達到預設次數時(例如預設次數為3次),記錄鎖定訊息於暫存器中,表示無法經由重新開機排除電源失效的問題。上述預設次數可依實際需求設計,並非用以限定本揭示內容,其他合適的預設次數亦涵蓋在本揭示內容的範圍之內。Alternatively, in another embodiment, when the power failure type is a motherboard power failure and not the central processing unit power failure (ie, the non-central processing unit power failure), a restart procedure is performed to attempt to eliminate the power failure described above. If the power is turned back on, it indicates that the power failure is a false power failure and no maintenance is required. If the boot fails, the restart procedure will be executed repeatedly. When the number of restarts of the program reaches a preset number of times (for example, the preset number of times is 3), the record lock message is recorded in the scratchpad, indicating that the power failure cannot be eliminated by restarting. The foregoing preset times may be designed according to actual needs, and are not intended to limit the disclosure, and other suitable preset times are also included in the scope of the present disclosure.

或者,於另一實施例中,當電源失效類型為板卡電源失效(包含板卡的預備電源失效與板卡的主電源失效)時,執行重啟程序以嘗試排除上述電源失效。若順利重新開機,表示此電源失效為假性的電源故障,無須進行維修。若無法順利開機,則反覆執行重啟程序。當重啟程序之次數達到預設次數時(例如預設次數為3次),記錄鎖定訊息於暫存器中,表示無法經由重新開機排除電源失效的問題。上述預設次數可依實際需求設計,並非用以限定本揭示內容,其他合適的預設次數亦涵蓋在本揭示內容的範圍之內。於一實施例中,當電源失效類型為板卡電源失效時(包含板卡的預備電源失效與板卡的主電源失效),透過顯示裝置發出提醒。舉例而言,透過燈號裝置發出警示光,但本揭示內容不以此為限。Alternatively, in another embodiment, when the power failure type is a board power failure (including the backup power failure of the board and the main power failure of the board), a restart procedure is performed to try to eliminate the power failure described above. If the power is turned back on, it indicates that the power failure is a false power failure and no maintenance is required. If the boot fails, the restart procedure will be executed repeatedly. When the number of restarts of the program reaches a preset number of times (for example, the preset number of times is 3), the record lock message is recorded in the scratchpad, indicating that the power failure cannot be eliminated by restarting. The foregoing preset times may be designed according to actual needs, and are not intended to limit the disclosure, and other suitable preset times are also included in the scope of the present disclosure. In an embodiment, when the power failure type is a failure of the power supply of the board (including the failure of the standby power supply of the board and the failure of the main power supply of the board), an alert is sent through the display device. For example, the warning light is emitted through the light device, but the disclosure is not limited thereto.

於一實施例中,當記錄鎖定訊息於暫存器中時,傳送暫存器之位置訊息至基板管理控制器模組。基板管理控制器模組根據位置訊息讀取暫存器之鎖定訊息,以執行鎖定程序。In one embodiment, when the record lock message is in the register, the location information of the register is transferred to the baseboard management controller module. The baseboard management controller module reads the lock message of the register according to the position information to execute the lock program.

於一實施例中,經由程式碼定義的電源掃描程序以記錄電源失效發生在中央處理單元電源、非中央處理單元電源或板卡電源內部之電源失效位置,因此改善伺服器系統電源失效的錯誤分析與故障維修過程。舉例而言,當發生板卡電源失效時,經由電源掃描程序以記錄電源失效發生於板卡的哪一個電源。因此,若無法經由重啟程序排除電源失效問題時,可有效地節省檢查電源失效位置的時間。當中央處理單元電源失效發生,或者非中央處理單元電源(或板卡電源)失效發生並且重啟程序的次數達到預設次數時,複雜型可程式化邏輯元件記錄鎖定訊息於失效的電源的對應腳位連接的暫存器中,並傳送上述暫存器的位置訊息至基板管理控制器模組。In one embodiment, the power supply scanning program defined by the code is used to record a power failure occurring in a power failure position of the central processing unit power supply, the non-central processing unit power supply, or the board power supply, thereby improving the error analysis of the server system power failure. With the fault repair process. For example, when a board power failure occurs, a power scan procedure is used to record which power supply of the board occurs in the power failure. Therefore, if the power failure problem cannot be eliminated by the restart procedure, the time for checking the power failure position can be effectively saved. When the central processing unit power failure occurs, or the non-central processing unit power supply (or board power supply) fails and the number of restarts reaches a preset number of times, the complex programmable logic element records the lock message to the corresponding pin of the failed power supply. The bit is connected to the scratchpad and transmits the location information of the temporary register to the baseboard management controller module.

綜上所述,本揭示內容得以經由上述實施例,識別伺服器系統發生電源失效的電源,而無須逐一量測電源的開機時序,並且根據上述失效電源的不同進行相對應的後續處理,以排除假性的電源故障事件,亦即確認真正的電源故障事件。進一步減少因假性的電源故障事件造成的維修成本。另一方面,若無法透過重啟程序排除電源失效,則可透過電源失效的相關資訊以改善伺服器系統電源失效的錯誤分析與故障維修過程。In summary, the disclosure can identify the power supply failure of the server system through the above embodiments, without measuring the power-on sequence of the power supply one by one, and performing corresponding subsequent processing according to the different failed power supplies to eliminate A false power failure event, that is, a true power failure event is confirmed. Further reduce maintenance costs due to false power failure events. On the other hand, if the power failure cannot be eliminated through the restart procedure, the information about the power failure can be used to improve the error analysis and fault repair process of the server system power failure.

雖然本揭示內容已以實施方式揭露如上,然其並非用以限定本發明,任何熟習此技藝者,在不脫離本揭示內容之精神和範圍內,當可作各種之更動與潤飾,因此本發明之保護範圍當視申請專利範圍所界定者為準。Although the present disclosure has been disclosed in the above embodiments, it is not intended to limit the invention, and the present invention may be modified and retouched without departing from the spirit and scope of the present disclosure. The scope of protection is subject to the definition of the scope of patent application.

為讓本揭示內容之上述和其他目的、特徵、優點與實施例能更明顯易懂,所附符號之說明如下:
100‧‧‧電源失效偵測系統
110‧‧‧複雜型可程式化邏輯裝置
120‧‧‧暫存器
130‧‧‧基板管理控制器模組
140‧‧‧主機板
142‧‧‧中央處理單元電源
144‧‧‧非中央處理單元電源
150‧‧‧板卡
152‧‧‧板卡電源
200‧‧‧電源失效偵測方法
S202~S208‧‧‧步驟
The above and other objects, features, advantages and embodiments of the present disclosure will become more apparent and understood.
100‧‧‧Power Failure Detection System
110‧‧‧Complex programmable logic devices
120‧‧‧ register
130‧‧‧Baseboard Management Controller Module
140‧‧‧ motherboard
142‧‧‧Central Processing Unit Power Supply
144‧‧‧Uncentralized processing unit power supply
150‧‧‧ boards
152‧‧‧ board power supply
200‧‧‧Power failure detection method
S202~S208‧‧‧Steps

為讓本揭示內容之上述和其他目的、特徵、優點與實施例能更明顯易懂,所附圖示之說明如下: 第1圖係說明本揭示內容一實施例之電源失效偵測系統示意圖;以及 第2圖係說明本揭示內容另一實施例之電源失效偵測方法流程圖。The above and other objects, features, advantages and embodiments of the present disclosure will be apparent from the accompanying drawings. FIG. 1 is a schematic diagram showing a power failure detection system according to an embodiment of the present disclosure; And FIG. 2 is a flow chart illustrating a power failure detection method according to another embodiment of the present disclosure.

100‧‧‧電源失效偵測系統 100‧‧‧Power Failure Detection System

110‧‧‧複雜型可程式化邏輯裝置 110‧‧‧Complex programmable logic devices

120‧‧‧暫存器 120‧‧‧ register

130‧‧‧基板管理控制器模組 130‧‧‧Baseboard Management Controller Module

140‧‧‧主機板 140‧‧‧ motherboard

142‧‧‧中央處理單元電源 142‧‧‧Central Processing Unit Power Supply

144‧‧‧非中央處理單元電源 144‧‧‧Uncentralized processing unit power supply

150‧‧‧板卡 150‧‧‧ boards

152‧‧‧板卡電源 152‧‧‧ board power supply

Claims (10)

一種電源失效偵測系統,包含: 一主機板,包含: 一中央處理單元(Central Processing Unit,CPU)電源;以及 一非中央處理單元電源; 一板卡,包含一板卡電源; 一複雜型可程式化邏輯裝置(Complex Programmable Logic Device,CPLD),用以當發生電源失效時,執行一關機程序,識別一電源失效類型,並根據該電源失效類型判斷是否執行一重啟程序,該電源失效類型表示電源失效發生在該中央處理單元電源、該非中央處理單元電源或該板卡電源;以及 一基板管理控制器(Baseboard Management Controller,BMC)模組,包含: 一暫存器,電性耦接該複雜型可程式化邏輯裝置; 其中若該複雜型可程式化邏輯裝置執行該重啟程序,當該重啟程序之次數達到一預設次數時,記錄一鎖定訊息於該暫存器中;該基板管理控制器模組用以記錄該重啟程序之次數,根據該鎖定訊息執行一鎖定程序。A power failure detection system comprising: a motherboard comprising: a central processing unit (CPU) power supply; and a non-central processing unit power supply; a card comprising a card power supply; A Programmable Logic Device (CPLD) is configured to perform a shutdown procedure when a power failure occurs, identify a power failure type, and determine whether to execute a restart procedure according to the power failure type, the power failure type indication The power failure occurs in the central processing unit power supply, the non-central processing unit power supply or the board power supply; and a baseboard management controller (BMC) module, including: a temporary storage device electrically coupled to the complex a programmable logic device; wherein if the complex programmable logic device executes the restarting program, when the number of restarting programs reaches a preset number of times, a lock message is recorded in the temporary register; the substrate management control The module is used to record the number of restarts, and is executed according to the lock message Lock program. 請求項1所述之電源失效偵測系統,其中當該複雜型可程式化邏輯裝置記錄該鎖定訊息於該暫存器中時,傳送該暫存器之一位置訊息至該基板管理控制器模組,該基板管理控制器模組根據該位置訊息讀取該暫存器之該鎖定訊息。The power failure detection system of claim 1, wherein when the complex programmable logic device records the lock message in the register, transmitting a location message of the register to the baseboard management controller module The group management controller module reads the lock message of the register according to the location information. 請求項1所述之電源失效偵測系統,其中當該電源失效類型表示電源失效發生在該中央處理單元電源時,該複雜型可程式化邏輯裝置記錄該鎖定訊息於該暫存器中。The power failure detection system of claim 1, wherein the complex programmable logic device records the lock message in the register when the power failure type indicates that a power failure occurs at the central processing unit power supply. 請求項1所述之電源失效偵測系統,其中當該電源失效類型表示電源失效發生在該板卡電源或該非中央處理單元電源時,該複雜型可程式化邏輯裝置執行該重啟程序。The power failure detection system of claim 1, wherein the complex programmable logic device executes the restart procedure when the power failure type indicates that a power failure occurs at the power of the board or the power of the non-central processing unit. 請求項1所述之電源失效偵測系統,其中該複雜型可程式化邏輯裝置用以經由一電源掃描程序記錄電源失效發生在該中央處理單元電源、該非中央處理單元電源或該板卡電源內部之一電源失效位置。The power failure detection system of claim 1, wherein the complex programmable logic device is configured to record, via a power scanning program, that a power failure occurs in the central processing unit power supply, the non-central processing unit power supply, or the board power supply. One of the power failure locations. 一種電源失效偵測方法,包含: 當發生電源失效時,執行一關機程序; 識別一電源失效類型,並根據該電源失效類型判斷是否執行一重啟程序,其中該電源失效類型表示電源失效發生在一主機板之一中央處理單元電源、該主機板之一非中央處理單元電源或一板卡電源; 若執行該重啟程序,當該重啟程序之次數達到一預設次數時,記錄一鎖定訊息於該暫存器中,並透過一基板管理控制器模組記錄該重啟程序之次數;以及 透過該基板管理控制器模組,根據該鎖定訊息執行一鎖定程序。A power failure detection method includes: performing a shutdown procedure when a power failure occurs; identifying a power failure type, and determining whether to execute a restart procedure according to the power failure type, wherein the power failure type indicates that the power failure occurs in a One central processing unit power supply of the motherboard, one of the motherboards is not a central processing unit power supply or a board power supply; if the restarting process is executed, when the number of restarting programs reaches a preset number of times, a lock message is recorded And in the register, the number of times of restarting the program is recorded by a baseboard management controller module; and a locking program is executed according to the lock message through the baseboard management controller module. 請求項6所述之電源失效偵測方法,更包含: 當記錄該鎖定訊息於該暫存器中時,傳送該暫存器之一位置訊息至該基板管理控制器模組,該基板管理控制器模組根據該位置訊息讀取該暫存器之該鎖定訊息。The power failure detection method of claim 6, further comprising: transmitting a location information of the temporary storage device to the baseboard management controller module when the locking information is recorded in the temporary storage, the substrate management control The module reads the lock message of the register according to the location information. 請求項6所述之電源失效偵測方法,更包含: 當該電源失效類型表示電源失效發生在該中央處理單元電源時,記錄該鎖定訊息於該暫存器中。The power failure detection method of claim 6, further comprising: recording the lock message in the temporary memory when the power failure type indicates that a power failure occurs in the central processing unit power supply. 請求項6所述之電源失效偵測方法,更包含: 當該電源失效類型表示電源失效發生在該板卡電源或該非中央處理單元電源時,執行該重啟程序。The power failure detection method of claim 6, further comprising: performing the restart procedure when the power failure type indicates that a power failure occurs at the power of the board or the power of the non-central processing unit. 請求項6所述之電源失效偵測方法,更包含: 經由一電源掃描程序記錄電源失效發生在該中央處理單元電源、該非中央處理單元電源或該板卡電源內部之一電源失效位置。The power failure detection method of claim 6, further comprising: recording, by a power scanning program, that the power failure occurs at a power failure position of the central processing unit power supply, the non-central processing unit power supply, or the board power supply.
TW104125304A 2015-08-04 2015-08-04 Power failure detection system and method thereof TWI584114B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW104125304A TWI584114B (en) 2015-08-04 2015-08-04 Power failure detection system and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW104125304A TWI584114B (en) 2015-08-04 2015-08-04 Power failure detection system and method thereof

Publications (2)

Publication Number Publication Date
TW201706844A true TW201706844A (en) 2017-02-16
TWI584114B TWI584114B (en) 2017-05-21

Family

ID=58609190

Family Applications (1)

Application Number Title Priority Date Filing Date
TW104125304A TWI584114B (en) 2015-08-04 2015-08-04 Power failure detection system and method thereof

Country Status (1)

Country Link
TW (1) TWI584114B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107797960A (en) * 2017-11-03 2018-03-13 山东超越数控电子股份有限公司 A kind of server architecture of multiprocessor
CN111722954A (en) * 2020-06-30 2020-09-29 曙光信息产业(北京)有限公司 Server abnormity positioning method and device, storage medium and server

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI830623B (en) * 2023-03-15 2024-01-21 神雲科技股份有限公司 A motherboard detection method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3376306B2 (en) * 1998-12-25 2003-02-10 エヌイーシーマイクロシステム株式会社 Data processing apparatus and data processing method
US8255620B2 (en) * 2009-08-11 2012-08-28 Texas Memory Systems, Inc. Secure Flash-based memory system with fast wipe feature
TWI458995B (en) * 2010-08-24 2014-11-01 Hon Hai Prec Ind Co Ltd Power failure detection system and method of a server
TWI480726B (en) * 2012-12-11 2015-04-11 Inventec Corp Power supply controlling system for motherboard by boundary scan and method thereof
KR20150004169A (en) * 2013-07-02 2015-01-12 삼성전자주식회사 Power supply device, micro server having the same and method for power supplying
US20160197809A1 (en) * 2013-09-30 2016-07-07 Hewlett Packard Enterprise Development Lp Server downtime metering

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107797960A (en) * 2017-11-03 2018-03-13 山东超越数控电子股份有限公司 A kind of server architecture of multiprocessor
CN111722954A (en) * 2020-06-30 2020-09-29 曙光信息产业(北京)有限公司 Server abnormity positioning method and device, storage medium and server

Also Published As

Publication number Publication date
TWI584114B (en) 2017-05-21

Similar Documents

Publication Publication Date Title
US9778988B2 (en) Power failure detection system and method
WO2021169260A1 (en) System board card power supply test method, apparatus and device, and storage medium
US20240012706A1 (en) Method, system and apparatus for fault positioning in starting process of server
CN104320308B (en) A kind of method and device of server exception detection
WO2017063505A1 (en) Method for detecting hardware fault of server, apparatus thereof, and server
TWI632462B (en) Switching device and method for detecting i2c bus
TWI529624B (en) Method and system of fault tolerance for multiple servers
JP2011043957A (en) Fault monitoring circuit, semiconductor integrated circuit, and faulty part locating method
CN107678909B (en) Circuit and method for monitoring chip configuration state in server
TWI584114B (en) Power failure detection system and method thereof
US20080270827A1 (en) Recovering diagnostic data after out-of-band data capture failure
CN110445638B (en) Switch system fault protection method and device
CN103995760A (en) Computer fault detection device and detection and maintenance method
US8726088B2 (en) Method for processing booting errors
US9158646B2 (en) Abnormal information output system for a computer system
JP2014021577A (en) Apparatus, system, method, and program for failure prediction
TW201918880A (en) Device for detection before booting and operation method thereof
TWI779682B (en) Computer system, computer server and method of starting the same
TW201500911A (en) Debug device and debug method
CN114265489B (en) Power failure monitoring method and device, electronic equipment and storage medium
TWI494754B (en) Server monitoring apparatus and method thereof
TWI675293B (en) A host boot detection method and its system
JP6217086B2 (en) Information processing apparatus, error detection function diagnosis method, and computer program
WO2015083226A1 (en) Information processing device and information processing device control program
TWI777259B (en) Boot method