TW201706844A - Power failure detection system and method thereof - Google Patents
Power failure detection system and method thereof Download PDFInfo
- Publication number
- TW201706844A TW201706844A TW104125304A TW104125304A TW201706844A TW 201706844 A TW201706844 A TW 201706844A TW 104125304 A TW104125304 A TW 104125304A TW 104125304 A TW104125304 A TW 104125304A TW 201706844 A TW201706844 A TW 201706844A
- Authority
- TW
- Taiwan
- Prior art keywords
- power failure
- power
- power supply
- processing unit
- central processing
- Prior art date
Links
Abstract
Description
本發明是一種偵測技術,且特別是有關於一種電源失效偵測系統與其方法。The invention is a detection technology, and in particular relates to a power failure detection system and a method thereof.
當伺服器(Server)內部電源出現問題(例如無法開機上電、開機掉電、按下開關按鈕風扇轉動後掉電等現象)的時候,以往需要透過使用示波器(Oscilloscope)或者三用電表(Multimeter)依序量測開機時序(Power-on sequence)的電源良好(Power good)訊號以及致能(Enable)訊號等相關訊號,以判斷電源問題發生在哪一個電源而導致無法開機。When there is a problem with the internal power supply of the server (for example, the power cannot be turned on, the power is turned off, the power is turned off after the switch button is pressed, etc.), it is necessary to use an oscilloscope (Oscilloscope) or a three-meter ( Multimeter) sequentially measures the Power-on sequence and the Enable signal and other related signals to determine which power supply is causing the power problem and cannot be turned on.
然而,當伺服器組成系統之後,由於主機已經安裝於機箱內部,若發生上述問題則難以利用示波器或者三用電表量測開機時序訊號,因此無法在短時間之內判斷電源問題發生在哪一個電源,或者電源問題是否為假性的電源故障(Power fault),例如由於其他的系統配置(Configuration)所造成的錯誤。However, after the server is formed into a system, since the host is already installed inside the chassis, if the above problem occurs, it is difficult to measure the boot timing signal by using an oscilloscope or a three-meter, so that it is impossible to judge which power problem occurs in a short time. Whether the power supply, or power supply problem is a false power fault, such as an error caused by other system configurations.
為了識別伺服器系統發生的電源問題,並且即時處理以排除假性的電源故障造成的無法開機,本揭示內容之一態樣是提供一種電源失效偵測系統,其包含主機板(Motherboard)、板卡、複雜型可程式化邏輯裝置(Complex Programmable Logic Device,CPLD)與基板管理控制器(Baseboard Management Controller,BMC)模組,其中主機板包含中央處理單元(Central Processing Unit,CPU)電源與非中央處理單元電源,板卡包含板卡電源,基板管理控制器模組包含暫存器(Register),暫存器電性耦接複雜型可程式化邏輯裝置。複雜型可程式化邏輯裝置用以當發生電源失效時,執行關機(Shutdown)程序;識別電源失效類型,並根據電源失效類型判斷是否執行重啟(Restart)程序,電源失效類型表示電源失效發生在中央處理單元電源、非中央處理單元電源或板卡電源;若執行重啟程序,當重啟程序之次數達到預設次數時,記錄鎖定訊息於暫存器中。基板管理控制器模組用以記錄該重啟程序之次數,根據鎖定訊息執行鎖定程序。In order to identify the power problem that occurs in the server system and to process it in time to eliminate the inability to power on due to a false power failure, one aspect of the present disclosure is to provide a power failure detection system including a motherboard (Motherboard) and a board. Card, Complex Programmable Logic Device (CPLD) and Baseboard Management Controller (BMC) modules, wherein the motherboard includes a central processing unit (CPU) power supply and decentralized The processing unit power supply, the board includes the board power supply, and the base management controller module includes a register, and the temporary register is electrically coupled with the complex programmable logic device. The complex programmable logic device is configured to perform a Shutdown program when a power failure occurs; identify a power failure type, and determine whether to perform a restart procedure according to the power failure type, and the power failure type indicates that the power failure occurs in the center. Processing unit power, non-central processing unit power or board power; if the restart procedure is executed, when the number of restarts reaches a preset number, the lock message is recorded in the register. The baseboard management controller module is configured to record the number of times of the restarting process, and execute the locking procedure according to the lock message.
本揭示內容之一實施例中,其中當複雜型可程式化邏輯裝置記錄鎖定訊息於暫存器中時,傳送暫存器之位置訊息至基板管理控制器模組,基板管理控制器模組根據位置訊息讀取暫存器之鎖定訊息。In an embodiment of the present disclosure, when the complex programmable logic device records the lock message in the register, the location information of the buffer is transferred to the substrate management controller module, and the substrate management controller module is The location message reads the lock message of the scratchpad.
本揭示內容之一實施例中,其中當電源失效類型表示電源失效發生在中央處理單元電源時,複雜型可程式化邏輯裝置記錄該鎖定訊息於暫存器中。In one embodiment of the present disclosure, the complex programmable logic device records the lock message in the register when the power failure type indicates that a power failure occurred at the central processing unit power supply.
本揭示內容之一實施例中,其中當電源失效類型表示電源失效發生在板卡電源或非中央處理單元電源時,複雜型可程式化邏輯裝置執行重啟程序。In one embodiment of the present disclosure, the complex programmable logic device performs a restart procedure when the power failure type indicates that a power failure occurs at the power of the board or the power of the non-central processing unit.
本揭示內容之一實施例中,其中複雜型可程式化邏輯裝置用以經由電源掃描程序記錄電源失效發生在中央處理單元電源、非中央處理單元電源或板卡電源內部之電源失效位置。In one embodiment of the present disclosure, the complex programmable logic device is configured to record, via the power scan program, that a power failure occurs at a power failure location within the central processing unit power supply, the non-central processing unit power supply, or the board power supply.
本發明之另一態樣是提供一種電源失效偵測方法,其包含以下步驟:當發生電源失效時,執行關機程序;識別電源失效類型,並根據電源失效類型判斷是否執行重啟程序,其中電源失效類型表示電源失效發生在主機板之中央處理單元電源、主機板之非中央處理單元電源或板卡電源;若執行重啟程序,當重啟程序之次數達到預設次數時,記錄鎖定訊息於暫存器中,並透過基板管理控制器模組記錄重啟程序之次數;透過基板管理控制器模組根據該鎖定訊息執行鎖定程序。Another aspect of the present invention provides a power failure detection method, comprising the steps of: performing a shutdown procedure when a power failure occurs; identifying a power failure type, and determining whether to execute a restart procedure according to a power failure type, wherein the power failure occurs The type indicates that the power failure occurs in the central processing unit power of the motherboard, the non-central processing unit power of the motherboard, or the power of the board; if the restart procedure is executed, when the number of restarts reaches a preset number, the lock message is recorded in the register. And recording the number of restarting procedures through the baseboard management controller module; and performing a locking procedure according to the locked message by the baseboard management controller module.
本揭示內容之一實施例中,更包含:當記錄鎖定訊息於暫存器中時,傳送暫存器之位置訊息至基板管理控制器模組,基板管理控制器模組根據位置訊息讀取暫存器之鎖定訊息。In an embodiment of the present disclosure, the method further includes: when the record lock message is in the temporary register, transmitting the location information of the temporary register to the substrate management controller module, and the substrate management controller module reads the temporary information according to the location information. The lock message of the memory.
本揭示內容之一實施例中,更包含:當電源失效類型表示電源失效發生在中央處理單元電源時,記錄鎖定訊息於暫存器中。In an embodiment of the present disclosure, the method further includes: recording a lock message in the temporary register when the power failure type indicates that the power failure occurs in the central processing unit power supply.
本揭示內容之一實施例中,更包含:當電源失效類型表示電源失效發生在板卡電源或非中央處理單元電源時,執行重啟程序。In an embodiment of the present disclosure, the method further includes: performing a restart procedure when the power failure type indicates that the power failure occurs in the power supply of the card or the power of the non-central processing unit.
本揭示內容之一實施例中,更包含:經由電源掃描程序記錄電源失效發生在中央處理單元電源、非中央處理單元電源或板卡電源內部之電源失效位置。In an embodiment of the present disclosure, the method further includes: recording, by the power supply scanning program, a power failure occurs in a power failure position of the central processing unit power supply, the non-central processing unit power supply, or the board power supply.
綜上所述,本揭示內容可識別伺服器系統發生電源失效的電源,而無須逐一量測電源的開機時序,並且根據上述失效電源的不同進行相對應的後續處理,以排除假性的電源故障事件,亦即確認真正的電源故障事件。進一步減少因假性的電源故障事件造成的維修成本。另一方面,若無法透過重啟程序排除電源失效,則可透過電源失效的相關資訊以改善伺服器系統電源失效的錯誤分析與故障維修過程。In summary, the disclosure can identify the power supply failure of the server system, and does not need to measure the power-on sequence of the power supply one by one, and perform corresponding subsequent processing according to the different failed power supplies to eliminate false power failures. An event, that is, a confirmation of a true power failure event. Further reduce maintenance costs due to false power failure events. On the other hand, if the power failure cannot be eliminated through the restart procedure, the information about the power failure can be used to improve the error analysis and fault repair process of the server system power failure.
以下將以實施方式對上述之說明作詳細的描述,並對本揭示內容之技術方案提供更進一步的解釋。The above description will be described in detail in the following embodiments, and further explanation of the technical solutions of the present disclosure is provided.
為了使本揭示內容之敘述更加詳盡與完備,可參照附圖及以下所述之各種實施例。但所提供之實施例並非用以限制本發明所涵蓋的範圍;步驟的描述亦非用以限制其執行之順序,任何由重新組合,所產生具有均等功效的裝置,皆為本發明所涵蓋的範圍。To make the description of the present disclosure more detailed and complete, reference is made to the drawings and the various embodiments described below. The examples are not intended to limit the scope of the invention; the description of the steps is not intended to limit the order of execution thereof, and any device having equal efficiency resulting from recombination is covered by the present invention. range.
於實施方式與申請專利範圍中,除非內文中對於冠詞有所特別限定,否則「一」與「該」可泛指單一個或複數個。In the scope of the embodiments and claims, "one" and "the" may mean a single or plural unless the context specifically dictates the articles.
另外,關於本文中所使用之「耦接」及「連接」,均可指二或多個元件相互直接作實體接觸或電性接觸,或是相互間接作實體接觸或電性接觸,而「耦接」還可指二或多個元件相互操作或動作。In addition, as used herein, "coupled" and "connected" may mean that two or more elements are in direct physical or electrical contact with each other, or indirectly in physical or electrical contact with each other. "Connected" may also mean that two or more elements operate or interact with each other.
如第1圖所繪示,第1圖為本揭示內容一實施例之電源失效偵測系統100示意圖。電源失效偵測系統100包含主機板(Motherboard)140、板卡150、複雜型可程式化邏輯裝置110(Complex Programmable Logic Device,CPLD)與基板管理控制器(Baseboard Management Controller,BMC)模組130,其中主機板140包含中央處理單元(Central Processing Unit,CPU)電源142與非中央處理單元電源144,板卡150包含板卡電源152,基板管理控制器模組130包含暫存器120(Register),暫存器120電性耦接複雜型可程式化邏輯裝置110。As shown in FIG. 1 , FIG. 1 is a schematic diagram of a power failure detection system 100 according to an embodiment of the disclosure. The power failure detection system 100 includes a Motherboard 140, a board 150, a Complex Programmable Logic Device (CPLD), and a Baseboard Management Controller (BMC) module 130. The motherboard 140 includes a central processing unit (CPU) power supply 142 and a non-central processing unit power supply 144. The board 150 includes a board power supply 152. The baseboard management controller module 130 includes a register 120 (Register). The register 120 is electrically coupled to the complex programmable logic device 110.
複雜型可程式化邏輯裝置110即時偵測伺服器系統內部各個電源的運作狀況。舉例而言,應用於主機板140的主電源包含P12V、P5V、P3V3、PVDDQ、PVCCIN等,而預備(Stand-by)電源包含P12V_STBY、P5V_STBY、P3V3_STBY、P1V8_STBY、P1V_STBY等。於本實施例中,主機板上應用的電源可區分為中央處理單元電源142(例如PVCCIN),以及非中央處理單元電源144(例如供應至主機板的記憶裝置的電源P12V、P5V、P3V3、PVDDQ,以及預備電源P12V_STBY、P5V_STBY、P3V3_STBY、P1V8_STBY、P1V_STBY)。板卡電源152例如供應至伺服器背板(Backboard)的電源,其包含預備電源與主電源。The complex programmable logic device 110 instantly detects the operation of various power sources within the server system. For example, the main power applied to the motherboard 140 includes P12V, P5V, P3V3, PVDDQ, PVCCIN, etc., and the standby (by-by) power supply includes P12V_STBY, P5V_STBY, P3V3_STBY, P1V8_STBY, P1V_STBY, and the like. In this embodiment, the power applied to the motherboard can be divided into a central processing unit power supply 142 (eg, PVCCIN), and a non-central processing unit power supply 144 (eg, power supplies P12V, P5V, P3V3, PVDDQ supplied to the memory of the motherboard). And the standby power supply P12V_STBY, P5V_STBY, P3V3_STBY, P1V8_STBY, P1V_STBY). The board power supply 152 is, for example, supplied to a power supply of a server backboard, which includes a backup power source and a main power source.
於一實施例中,複雜型可程式化邏輯裝置110即時偵測供應至主機板140上的中央處理單元電源142與非中央處理單元電源144,以及供應至板卡的板卡電源152是否發生電源失效。當上述任一電源發生電源失效時,複雜型可程式化邏輯裝置110立即執行關機(Shutdown)程序,識別電源失效類型。複雜型可程式化邏輯裝置110進一步根據電源失效類型判斷是否執行重啟(Restart)程序。若複雜型可程式化邏輯裝置110執行重啟程序,並且順利重新開機,表示此電源失效為假性的電源故障,無須進行維修。若複雜型可程式化邏輯裝置110無法順利開機,則反覆執行重啟程序。當重啟程序之次數達到預設次數時(例如預設次數為3次),記錄鎖定訊息於暫存器120中,以表示電源失效無法經由重新開機排除。基板管理控制器模組130用以根據鎖定訊息執行鎖定程序以鎖定伺服器系統,以避免伺服器系統在電源失效問題排除之前因誤動作造成二次損害。此外,基板管理控制器模組130亦用以記錄重啟程序之次數。因此,複雜型可程式化邏輯裝置110在伺服器系統發生電源失效時,識別電源失效類型,並可進一步根據電源失效類型判斷是否執行重啟程序以排除假性的電源故障事件,亦即確認真正的電源故障事件。上述預設次數可依實際需求設計,並非用以限定本揭示內容,其他合適的預設次數亦涵蓋在本揭示內容的範圍之內。In one embodiment, the complex programmable logic device 110 instantly detects whether the central processing unit power supply 142 and the non-central processing unit power supply 144 supplied to the motherboard 140 and the power supply to the board power supply 152 supplied to the card are powered. Invalid. When a power failure occurs in any of the above power supplies, the complex programmable logic device 110 immediately performs a Shutdown procedure to identify the type of power failure. The complex programmable logic device 110 further determines whether to execute a restart procedure based on the type of power failure. If the complex programmable logic device 110 executes the restart procedure and restarts smoothly, it indicates that the power failure is a false power failure and no maintenance is required. If the complex programmable logic device 110 fails to boot smoothly, the restart process is repeatedly executed. When the number of restarts of the program reaches a preset number of times (for example, the preset number of times is 3), a lock message is recorded in the register 120 to indicate that the power failure cannot be eliminated by rebooting. The baseboard management controller module 130 is configured to perform a locking procedure according to the lock message to lock the server system to prevent the server system from causing secondary damage due to malfunction before the power failure problem is eliminated. In addition, the substrate management controller module 130 is also used to record the number of restart procedures. Therefore, the complex programmable logic device 110 identifies the type of power failure when the server system fails, and can further determine whether to perform a restart procedure according to the type of power failure to eliminate a false power failure event, that is, to confirm the true Power failure event. The foregoing preset times may be designed according to actual needs, and are not intended to limit the disclosure, and other suitable preset times are also included in the scope of the present disclosure.
以下為根據複雜型可程式化邏輯裝置110對於不同的電源失效類型判斷是否執行重啟程序的細節描述。於一實施例中,當電源失效類型為中央處理單元電源142失效時,複雜型可程式化邏輯裝置110直接記錄鎖定訊息於暫存器120中,而不執行重啟程序。基板管理控制器模組130根據鎖定訊息執行鎖定程序以鎖定伺服器系統,以致於伺服器系統在中央處理單元電源142失效問題排除之前無法開機。由於中央處理單元電源失效屬於致命的(Fatal)電源失效,因此複雜型可程式化邏輯裝置110直接記錄鎖定訊息於暫存器120中,而不會對伺服器系統執行重啟程序。基板管理控制器模組130用以執行伺服器系統的鎖定程序,以避免誤動作造成系統的二次損害。The following is a detailed description of whether or not to execute a restart procedure for different power failure types according to the complex programmable logic device 110. In one embodiment, when the power failure type is a failure of the central processing unit power supply 142, the complex programmable logic device 110 directly records the lock message in the scratchpad 120 without performing a restart procedure. The baseboard management controller module 130 executes a locking procedure based on the lock message to lock the server system such that the server system fails to power on before the central processing unit power supply 142 failure issue is eliminated. Since the central processing unit power failure is a fatal power failure, the complex programmable logic device 110 directly records the lock message in the scratchpad 120 without performing a restart procedure on the server system. The baseboard management controller module 130 is configured to execute a locking procedure of the server system to avoid secondary damage caused by a malfunction.
或者,於另一實施例中,當電源失效類型為主機板140的電源失效,並且不是中央處理單元電源142失效(亦即非中央處理單元電源144失效)時,複雜型可程式化邏輯裝置110執行重啟程序以嘗試排除上述電源失效。若複雜型可程式化邏輯裝置110順利重新開機,表示此電源失效為假性的電源故障,無須進行維修。若複雜型可程式化邏輯裝置110無法順利開機,則反覆執行重啟程序。當重啟程序之次數達到預設次數時(例如預設次數為3次),記錄鎖定訊息於暫存器120中,表示無法經由重新開機排除電源失效的問題。上述預設次數可依實際需求設計,並非用以限定本揭示內容,其他合適的預設次數亦涵蓋在本揭示內容的範圍之內。Alternatively, in another embodiment, the complex programmable logic device 110 when the power failure type is a power failure of the motherboard 140 and not the central processing unit power supply 142 fails (ie, the non-central processing unit power supply 144 fails). Perform a restart procedure to try to troubleshoot the above power failure. If the complex programmable logic device 110 is successfully restarted, it indicates that the power failure is a false power failure and no maintenance is required. If the complex programmable logic device 110 fails to boot smoothly, the restart process is repeatedly executed. When the number of restarts of the program reaches a preset number of times (for example, the preset number of times is 3), the record lock message is recorded in the register 120, indicating that the power failure cannot be eliminated by restarting. The foregoing preset times may be designed according to actual needs, and are not intended to limit the disclosure, and other suitable preset times are also included in the scope of the present disclosure.
或者,於另一實施例中,當電源失效類型為板卡電源152失效(包含板卡的預備電源失效與板卡的主電源失效)時,複雜型可程式化邏輯裝置110執行重啟程序以嘗試排除上述電源失效。若複雜型可程式化邏輯裝置110順利重新開機,表示此電源失效為假性的電源故障,無須進行維修。若複雜型可程式化邏輯裝置110無法順利開機,則反覆執行重啟程序。當重啟程序之次數達到預設次數時(例如預設次數為3次),記錄鎖定訊息於暫存器120中,表示無法經由重新開機排除電源失效的問題。上述預設次數可依實際需求設計,並非用以限定本揭示內容,其他合適的預設次數亦涵蓋在本揭示內容的範圍之內。於一實施例中,當電源失效類型為板卡電源152失效時(包含板卡的預備電源失效與板卡的主電源失效),透過顯示裝置發出提醒。舉例而言,透過燈號裝置發出紅光,但本揭示內容不以此為限。Alternatively, in another embodiment, when the power failure type is a failure of the board power supply 152 (including the standby power failure of the board and the main power failure of the board), the complex programmable logic device 110 performs a restart procedure to try Eliminate the above power failure. If the complex programmable logic device 110 is successfully restarted, it indicates that the power failure is a false power failure and no maintenance is required. If the complex programmable logic device 110 fails to boot smoothly, the restart process is repeatedly executed. When the number of restarts of the program reaches a preset number of times (for example, the preset number of times is 3), the record lock message is recorded in the register 120, indicating that the power failure cannot be eliminated by restarting. The foregoing preset times may be designed according to actual needs, and are not intended to limit the disclosure, and other suitable preset times are also included in the scope of the present disclosure. In one embodiment, when the power failure type is that the board power supply 152 fails (including the standby power failure of the board and the main power failure of the board), an alert is sent through the display device. For example, red light is emitted through the light device, but the disclosure is not limited thereto.
於一實施例中,當複雜型可程式化邏輯裝置110記錄鎖定訊息於暫存器120中時,傳送暫存器120之位置訊息至基板管理控制器模組130。基板管理控制器模組130根據位置訊息讀取暫存器120之鎖定訊息,以執行鎖定程序。In one embodiment, when the complex programmable logic device 110 records the lock message in the register 120, the location information of the register 120 is transferred to the baseboard management controller module 130. The baseboard management controller module 130 reads the lock message of the register 120 according to the location information to execute the lock procedure.
於一實施例中,複雜型可程式化邏輯元件210透過程式碼定義電源掃描程序以記錄電源失效發生在中央處理單元電源142、非中央處理單元電源144或板卡電源152內部之電源失效位置,因此改善伺服器系統電源失效的錯誤分析與故障維修過程。舉例而言,當發生板卡電源152失效時,複雜型可程式化邏輯裝置110用以經由電源掃描程序以記錄電源失效發生於板卡上的哪一個電源。因此,若無法經由重啟程序排除電源失效問題時,可有效地節省檢查電源失效位置的時間。當中央處理單元電源142失效發生,或者非中央處理單元電源144(或板卡電源152)失效發生並且重啟程序的次數達到預設次數時,複雜型可程式化邏輯元件210記錄鎖定訊息於失效的電源的對應腳位(pin)連接的暫存器120中,並傳送上述暫存器120的位置訊息至基板管理控制器模組130。In one embodiment, the complex programmable logic component 210 defines a power scan program through the code to record a power failure occurring at a power failure location within the central processing unit power supply 142, the non-central processing unit power supply 144, or the board power supply 152. Therefore, the error analysis and fault repair process of the server system power failure are improved. For example, when the board power supply 152 fails, the complex programmable logic device 110 uses a power scanning program to record which power source on the board the power failure occurred. Therefore, if the power failure problem cannot be eliminated by the restart procedure, the time for checking the power failure position can be effectively saved. When the central processing unit power supply 142 fails, or the non-central processing unit power supply 144 (or the board power supply 152) fails and the number of restarts reaches a preset number of times, the complex programmable logic element 210 records the lock message in the failed state. The corresponding pin of the power source is connected to the register 120, and the position information of the register 120 is transmitted to the substrate management controller module 130.
第2圖係說明本揭示內容另一實施例之電源失效偵測方法200流程圖。電源失效偵測方法200包括多個步驟S202~S208,可應用於如第1圖所示的電源失效偵測系統100中,然熟習本案之技藝者應瞭解到,在本實施例中所提及的步驟,除特別敘明其順序者外,均可依實際需要調整其前後順序,甚至可同時或部分同時執行。2 is a flow chart illustrating a power failure detection method 200 in accordance with another embodiment of the present disclosure. The power failure detection method 200 includes a plurality of steps S202-S208, which can be applied to the power failure detection system 100 as shown in FIG. 1, but those skilled in the art should understand that the reference in this embodiment is mentioned. The steps can be adjusted according to actual needs, except for the order in which they are specifically stated, or even simultaneously or partially.
以下說明以伺服器系統的電源為例。即時偵測伺服器系統內部各個電源的運作狀況。舉例而言,應用於主機板的主電源包含P12V、P5V、P3V3、PVDDQ、PVCCIN等,而預備電源包含P12V_STBY、P5V_STBY、P3V3_STBY、P1V8_STBY、P1V_STBY等。於本實施例中,主機板上應用的電源可區分為中央處理單元電源(例如PVCCIN),以及非中央處理單元電源(例如供應至主機板的記憶裝置的電源P12V、P5V、P3V3、PVDDQ,以及預備電源P12V_STBY、P5V_STBY、P3V3_STBY、P1V8_STBY、P1V_STBY)。板卡電源例如供應至伺服器背板的電源,其包含預備電源與主電源。The following description uses the power supply of the server system as an example. Instantly detect the operation status of each power supply inside the server system. For example, the main power supply applied to the motherboard includes P12V, P5V, P3V3, PVDDQ, PVCCIN, etc., and the standby power supply includes P12V_STBY, P5V_STBY, P3V3_STBY, P1V8_STBY, P1V_STBY, and the like. In this embodiment, the power applied to the motherboard can be divided into a central processing unit power supply (eg, PVCCIN), and a non-central processing unit power supply (eg, power supplies P12V, P5V, P3V3, PVDDQ supplied to the memory of the motherboard), and Prepare the power supply P12V_STBY, P5V_STBY, P3V3_STBY, P1V8_STBY, P1V_STBY). The board power supply, for example, is supplied to a power supply of the server backplane, which includes a backup power source and a main power source.
於一實施例中,即時偵測供應至主機板上的中央處理單元電源與非中央處理單元電源,以及供應至板卡的電源是否發生電源失效。於步驟S202中,當上述任一電源發生電源失效時,立即執行關機程序。於步驟S204中,識別電源失效類型,並根據電源失效類型判斷是否執行重啟程序。若重啟程序執行並且順利重新開機,表示此電源失效為假性的電源故障,無須進行維修。若無法順利開機,則反覆執行重啟程序。於步驟S206中,若重啟程序執行並且當重啟程序之次數達到預設次數時(例如預設次數為3次),記錄鎖定訊息於暫存器中,以表示電源失效無法經由重新開機排除。此外,亦透過基板管理控制器模組記錄重啟程序之次數。於步驟S208中,透過基板管理控制器模組根據鎖定訊息執行鎖定程序以鎖定伺服器系統,以避免伺服器系統在電源失效問題排除之前因誤動作造成二次損害。因此,在伺服器系統發生電源失效時,識別電源失效類型,並可進一步根據電源失效類型判斷是否執行重啟程序以排除假性的電源故障事件,亦即確認真正的電源故障事件。上述預設次數可依實際需求設計,並非用以限定本揭示內容,其他合適的預設次數亦涵蓋在本揭示內容的範圍之內。In one embodiment, the central processing unit power supply to the motherboard and the power of the non-central processing unit, and the power supply to the board, are detected immediately. In step S202, when any of the above power sources fails, the shutdown procedure is immediately executed. In step S204, the power failure type is identified, and whether the restart procedure is executed is determined according to the power failure type. If the restart program is executed and the power is turned back on, it indicates that the power failure is a false power failure and no maintenance is required. If the boot fails, the restart procedure will be executed repeatedly. In step S206, if the restarting of the program is performed and when the number of restarting the program reaches a preset number of times (for example, the preset number of times is 3), the lock information is recorded in the register to indicate that the power failure cannot be eliminated by restarting. In addition, the number of restart procedures is also recorded through the baseboard management controller module. In step S208, the substrate management controller module executes a locking procedure according to the lock message to lock the server system to prevent the server system from causing secondary damage due to malfunction before the power failure problem is eliminated. Therefore, when a power failure occurs in the server system, the type of power failure is identified, and it is further determined whether to perform a restart procedure according to the type of power failure to eliminate a false power failure event, that is, to confirm a true power failure event. The foregoing preset times may be designed according to actual needs, and are not intended to limit the disclosure, and other suitable preset times are also included in the scope of the present disclosure.
以下為根據不同的電源失效類型判斷是否執行重啟程序的細節描述。於一實施例中,當電源失效類型為中央處理單元電源失效時,直接記錄鎖定訊息於暫存器中,而不執行重啟程序。透過基板管理控制器模組根據鎖定訊息執行鎖定程序以鎖定伺服器系統,以致於伺服器系統在中央處理單元電源失效問題排除之前無法開機。由於中央處理單元電源失效屬於致命的電源失效,因此直接記錄鎖定訊息於暫存器中,而不會對伺服器系統執行重啟程序。透過基板管理控制器模組執行伺服器系統的鎖定程序,以避免誤動作造成系統的二次損害。The following is a detailed description of whether to perform a restart procedure based on different power failure types. In one embodiment, when the power failure type is a failure of the central processing unit power supply, the lock information is directly recorded in the temporary memory without performing a restart procedure. The locking process is executed by the baseboard management controller module according to the lock message to lock the server system, so that the server system cannot be turned on before the central processing unit power failure problem is eliminated. Since the central processing unit power failure is a fatal power failure, the lock message is directly recorded in the scratchpad without performing a restart procedure on the server system. The locking program of the server system is executed through the baseboard management controller module to avoid secondary damage caused by malfunction.
或者,於另一實施例中,當電源失效類型為主機板電源失效,並且不是中央處理單元電源失效(亦即非中央處理單元電源失效)時,執行重啟程序以嘗試排除上述電源失效。若順利重新開機,表示此電源失效為假性的電源故障,無須進行維修。若無法順利開機,則反覆執行重啟程序。當重啟程序之次數達到預設次數時(例如預設次數為3次),記錄鎖定訊息於暫存器中,表示無法經由重新開機排除電源失效的問題。上述預設次數可依實際需求設計,並非用以限定本揭示內容,其他合適的預設次數亦涵蓋在本揭示內容的範圍之內。Alternatively, in another embodiment, when the power failure type is a motherboard power failure and not the central processing unit power failure (ie, the non-central processing unit power failure), a restart procedure is performed to attempt to eliminate the power failure described above. If the power is turned back on, it indicates that the power failure is a false power failure and no maintenance is required. If the boot fails, the restart procedure will be executed repeatedly. When the number of restarts of the program reaches a preset number of times (for example, the preset number of times is 3), the record lock message is recorded in the scratchpad, indicating that the power failure cannot be eliminated by restarting. The foregoing preset times may be designed according to actual needs, and are not intended to limit the disclosure, and other suitable preset times are also included in the scope of the present disclosure.
或者,於另一實施例中,當電源失效類型為板卡電源失效(包含板卡的預備電源失效與板卡的主電源失效)時,執行重啟程序以嘗試排除上述電源失效。若順利重新開機,表示此電源失效為假性的電源故障,無須進行維修。若無法順利開機,則反覆執行重啟程序。當重啟程序之次數達到預設次數時(例如預設次數為3次),記錄鎖定訊息於暫存器中,表示無法經由重新開機排除電源失效的問題。上述預設次數可依實際需求設計,並非用以限定本揭示內容,其他合適的預設次數亦涵蓋在本揭示內容的範圍之內。於一實施例中,當電源失效類型為板卡電源失效時(包含板卡的預備電源失效與板卡的主電源失效),透過顯示裝置發出提醒。舉例而言,透過燈號裝置發出警示光,但本揭示內容不以此為限。Alternatively, in another embodiment, when the power failure type is a board power failure (including the backup power failure of the board and the main power failure of the board), a restart procedure is performed to try to eliminate the power failure described above. If the power is turned back on, it indicates that the power failure is a false power failure and no maintenance is required. If the boot fails, the restart procedure will be executed repeatedly. When the number of restarts of the program reaches a preset number of times (for example, the preset number of times is 3), the record lock message is recorded in the scratchpad, indicating that the power failure cannot be eliminated by restarting. The foregoing preset times may be designed according to actual needs, and are not intended to limit the disclosure, and other suitable preset times are also included in the scope of the present disclosure. In an embodiment, when the power failure type is a failure of the power supply of the board (including the failure of the standby power supply of the board and the failure of the main power supply of the board), an alert is sent through the display device. For example, the warning light is emitted through the light device, but the disclosure is not limited thereto.
於一實施例中,當記錄鎖定訊息於暫存器中時,傳送暫存器之位置訊息至基板管理控制器模組。基板管理控制器模組根據位置訊息讀取暫存器之鎖定訊息,以執行鎖定程序。In one embodiment, when the record lock message is in the register, the location information of the register is transferred to the baseboard management controller module. The baseboard management controller module reads the lock message of the register according to the position information to execute the lock program.
於一實施例中,經由程式碼定義的電源掃描程序以記錄電源失效發生在中央處理單元電源、非中央處理單元電源或板卡電源內部之電源失效位置,因此改善伺服器系統電源失效的錯誤分析與故障維修過程。舉例而言,當發生板卡電源失效時,經由電源掃描程序以記錄電源失效發生於板卡的哪一個電源。因此,若無法經由重啟程序排除電源失效問題時,可有效地節省檢查電源失效位置的時間。當中央處理單元電源失效發生,或者非中央處理單元電源(或板卡電源)失效發生並且重啟程序的次數達到預設次數時,複雜型可程式化邏輯元件記錄鎖定訊息於失效的電源的對應腳位連接的暫存器中,並傳送上述暫存器的位置訊息至基板管理控制器模組。In one embodiment, the power supply scanning program defined by the code is used to record a power failure occurring in a power failure position of the central processing unit power supply, the non-central processing unit power supply, or the board power supply, thereby improving the error analysis of the server system power failure. With the fault repair process. For example, when a board power failure occurs, a power scan procedure is used to record which power supply of the board occurs in the power failure. Therefore, if the power failure problem cannot be eliminated by the restart procedure, the time for checking the power failure position can be effectively saved. When the central processing unit power failure occurs, or the non-central processing unit power supply (or board power supply) fails and the number of restarts reaches a preset number of times, the complex programmable logic element records the lock message to the corresponding pin of the failed power supply. The bit is connected to the scratchpad and transmits the location information of the temporary register to the baseboard management controller module.
綜上所述,本揭示內容得以經由上述實施例,識別伺服器系統發生電源失效的電源,而無須逐一量測電源的開機時序,並且根據上述失效電源的不同進行相對應的後續處理,以排除假性的電源故障事件,亦即確認真正的電源故障事件。進一步減少因假性的電源故障事件造成的維修成本。另一方面,若無法透過重啟程序排除電源失效,則可透過電源失效的相關資訊以改善伺服器系統電源失效的錯誤分析與故障維修過程。In summary, the disclosure can identify the power supply failure of the server system through the above embodiments, without measuring the power-on sequence of the power supply one by one, and performing corresponding subsequent processing according to the different failed power supplies to eliminate A false power failure event, that is, a true power failure event is confirmed. Further reduce maintenance costs due to false power failure events. On the other hand, if the power failure cannot be eliminated through the restart procedure, the information about the power failure can be used to improve the error analysis and fault repair process of the server system power failure.
雖然本揭示內容已以實施方式揭露如上,然其並非用以限定本發明,任何熟習此技藝者,在不脫離本揭示內容之精神和範圍內,當可作各種之更動與潤飾,因此本發明之保護範圍當視申請專利範圍所界定者為準。Although the present disclosure has been disclosed in the above embodiments, it is not intended to limit the invention, and the present invention may be modified and retouched without departing from the spirit and scope of the present disclosure. The scope of protection is subject to the definition of the scope of patent application.
為讓本揭示內容之上述和其他目的、特徵、優點與實施例能更明顯易懂,所附符號之說明如下:
100‧‧‧電源失效偵測系統
110‧‧‧複雜型可程式化邏輯裝置
120‧‧‧暫存器
130‧‧‧基板管理控制器模組
140‧‧‧主機板
142‧‧‧中央處理單元電源
144‧‧‧非中央處理單元電源
150‧‧‧板卡
152‧‧‧板卡電源
200‧‧‧電源失效偵測方法
S202~S208‧‧‧步驟The above and other objects, features, advantages and embodiments of the present disclosure will become more apparent and understood.
100‧‧‧Power Failure Detection System
110‧‧‧Complex programmable logic devices
120‧‧‧ register
130‧‧‧Baseboard Management Controller Module
140‧‧‧ motherboard
142‧‧‧Central Processing Unit Power Supply
144‧‧‧Uncentralized processing unit power supply
150‧‧‧ boards
152‧‧‧ board power supply
200‧‧‧Power failure detection method
S202~S208‧‧‧Steps
為讓本揭示內容之上述和其他目的、特徵、優點與實施例能更明顯易懂,所附圖示之說明如下: 第1圖係說明本揭示內容一實施例之電源失效偵測系統示意圖;以及 第2圖係說明本揭示內容另一實施例之電源失效偵測方法流程圖。The above and other objects, features, advantages and embodiments of the present disclosure will be apparent from the accompanying drawings. FIG. 1 is a schematic diagram showing a power failure detection system according to an embodiment of the present disclosure; And FIG. 2 is a flow chart illustrating a power failure detection method according to another embodiment of the present disclosure.
100‧‧‧電源失效偵測系統 100‧‧‧Power Failure Detection System
110‧‧‧複雜型可程式化邏輯裝置 110‧‧‧Complex programmable logic devices
120‧‧‧暫存器 120‧‧‧ register
130‧‧‧基板管理控制器模組 130‧‧‧Baseboard Management Controller Module
140‧‧‧主機板 140‧‧‧ motherboard
142‧‧‧中央處理單元電源 142‧‧‧Central Processing Unit Power Supply
144‧‧‧非中央處理單元電源 144‧‧‧Uncentralized processing unit power supply
150‧‧‧板卡 150‧‧‧ boards
152‧‧‧板卡電源 152‧‧‧ board power supply
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW104125304A TWI584114B (en) | 2015-08-04 | 2015-08-04 | Power failure detection system and method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW104125304A TWI584114B (en) | 2015-08-04 | 2015-08-04 | Power failure detection system and method thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
TW201706844A true TW201706844A (en) | 2017-02-16 |
TWI584114B TWI584114B (en) | 2017-05-21 |
Family
ID=58609190
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW104125304A TWI584114B (en) | 2015-08-04 | 2015-08-04 | Power failure detection system and method thereof |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI584114B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107797960A (en) * | 2017-11-03 | 2018-03-13 | 山东超越数控电子股份有限公司 | A kind of server architecture of multiprocessor |
CN111722954A (en) * | 2020-06-30 | 2020-09-29 | 曙光信息产业(北京)有限公司 | Server abnormity positioning method and device, storage medium and server |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI830623B (en) * | 2023-03-15 | 2024-01-21 | 神雲科技股份有限公司 | A motherboard detection method |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3376306B2 (en) * | 1998-12-25 | 2003-02-10 | エヌイーシーマイクロシステム株式会社 | Data processing apparatus and data processing method |
US8255620B2 (en) * | 2009-08-11 | 2012-08-28 | Texas Memory Systems, Inc. | Secure Flash-based memory system with fast wipe feature |
TWI458995B (en) * | 2010-08-24 | 2014-11-01 | Hon Hai Prec Ind Co Ltd | Power failure detection system and method of a server |
TWI480726B (en) * | 2012-12-11 | 2015-04-11 | Inventec Corp | Power supply controlling system for motherboard by boundary scan and method thereof |
KR20150004169A (en) * | 2013-07-02 | 2015-01-12 | 삼성전자주식회사 | Power supply device, micro server having the same and method for power supplying |
US20160197809A1 (en) * | 2013-09-30 | 2016-07-07 | Hewlett Packard Enterprise Development Lp | Server downtime metering |
-
2015
- 2015-08-04 TW TW104125304A patent/TWI584114B/en active
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107797960A (en) * | 2017-11-03 | 2018-03-13 | 山东超越数控电子股份有限公司 | A kind of server architecture of multiprocessor |
CN111722954A (en) * | 2020-06-30 | 2020-09-29 | 曙光信息产业(北京)有限公司 | Server abnormity positioning method and device, storage medium and server |
Also Published As
Publication number | Publication date |
---|---|
TWI584114B (en) | 2017-05-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9778988B2 (en) | Power failure detection system and method | |
WO2021169260A1 (en) | System board card power supply test method, apparatus and device, and storage medium | |
US20240012706A1 (en) | Method, system and apparatus for fault positioning in starting process of server | |
CN104320308B (en) | A kind of method and device of server exception detection | |
WO2017063505A1 (en) | Method for detecting hardware fault of server, apparatus thereof, and server | |
TWI632462B (en) | Switching device and method for detecting i2c bus | |
TWI529624B (en) | Method and system of fault tolerance for multiple servers | |
JP2011043957A (en) | Fault monitoring circuit, semiconductor integrated circuit, and faulty part locating method | |
CN107678909B (en) | Circuit and method for monitoring chip configuration state in server | |
TWI584114B (en) | Power failure detection system and method thereof | |
US20080270827A1 (en) | Recovering diagnostic data after out-of-band data capture failure | |
CN110445638B (en) | Switch system fault protection method and device | |
CN103995760A (en) | Computer fault detection device and detection and maintenance method | |
US8726088B2 (en) | Method for processing booting errors | |
US9158646B2 (en) | Abnormal information output system for a computer system | |
JP2014021577A (en) | Apparatus, system, method, and program for failure prediction | |
TW201918880A (en) | Device for detection before booting and operation method thereof | |
TWI779682B (en) | Computer system, computer server and method of starting the same | |
TW201500911A (en) | Debug device and debug method | |
CN114265489B (en) | Power failure monitoring method and device, electronic equipment and storage medium | |
TWI494754B (en) | Server monitoring apparatus and method thereof | |
TWI675293B (en) | A host boot detection method and its system | |
JP6217086B2 (en) | Information processing apparatus, error detection function diagnosis method, and computer program | |
WO2015083226A1 (en) | Information processing device and information processing device control program | |
TWI777259B (en) | Boot method |