TW201324115A - Computer system and boot managing method of computer system - Google Patents

Computer system and boot managing method of computer system Download PDF

Info

Publication number
TW201324115A
TW201324115A TW100146546A TW100146546A TW201324115A TW 201324115 A TW201324115 A TW 201324115A TW 100146546 A TW100146546 A TW 100146546A TW 100146546 A TW100146546 A TW 100146546A TW 201324115 A TW201324115 A TW 201324115A
Authority
TW
Taiwan
Prior art keywords
computer device
remote server
command
boot
log information
Prior art date
Application number
TW100146546A
Other languages
Chinese (zh)
Inventor
Chung-Nan Chen
Original Assignee
Inventec Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Corp filed Critical Inventec Corp
Priority to TW100146546A priority Critical patent/TW201324115A/en
Publication of TW201324115A publication Critical patent/TW201324115A/en

Links

Abstract

A computer system and a boot managing method of the computer system suited for a remote server are provided. The boot managing method includes the following steps: When a computer device managed by the remote server is in a shutdown state, a boot command is sent to the computer device through an Internet, and a counter is reset to countdown a time-out. When a Basic Input Output System (BIOS) start-OK log message is not received and the time-out is countdown completed, a reboot command is sent to the computer device through the Internet, and the counter is reset again to retest whether the computer device is boot completed. Thus, the remote server proceed a boot management with the computer device by using a cloud computing mechanism, so that the computer devices can be equipped with low-efficiency baseboard management controller (BMC) chips to reduce costs.

Description

電腦系統及電腦系統的開機管理方法Computer system and computer system boot management method

本發明是有關於一種基板管理控制器(Baseboard Management Controller,BMC)技術,且特別是有關於一種電腦系統及利用遠端伺服器對多台電腦裝置的開機管理方法。The present invention relates to a Baseboard Management Controller (BMC) technology, and more particularly to a computer system and a remote server for booting a plurality of computer devices.

基板管理控制器(Baseboard Management Controller;BMC)是智慧化平臺管理介面(Intelligent Platform Management Interface;IPMI)架構中的核心處理控制器,其可感測、監控及記錄伺服器中的各種工作情況,並將異常的工作情況進行偵測以執行相對應程序,以即時排除異常。此外,BMC並具備有遠端管理、系統狀態偵測與記錄、資料追蹤與系統回復等功能,使業者能夠有效管理眾多伺服器,並同時可以降低維護成本的期望。Baseboard Management Controller (BMC) is the core processing controller in the Intelligent Platform Management Interface (IPMI) architecture, which can sense, monitor and record various working conditions in the server. The abnormal working condition is detected to execute the corresponding program to eliminate the abnormality in an instant. In addition, BMC has remote management, system status detection and recording, data tracking and system response, enabling operators to effectively manage a large number of servers while reducing maintenance costs.

有鑒於BMC的功能強大,因此BMC在伺服器管理的地位上越發重要。BMC在伺服器中的效能及地位逐漸提升的同時,BMC晶片的成本也逐漸偏於昂貴,尤其是具有高運算效能的BMC晶片更是如此。In view of the power of BMC, BMC is becoming more and more important in the management of servers. While the performance and status of BMC in the server is gradually increasing, the cost of BMC chips is gradually becoming more expensive, especially for BMC chips with high computing performance.

在雲端運算技術的蓬勃發展下,許多研究機構及相關應用服務逐漸將複雜的運算及龐大的資料庫由本機端逐漸轉移到許多伺服器所串聯而成的雲端伺服器中,藉以在網路通暢的情況下能夠增加運算的速度及應用服務的反應速率。於現在的雲端伺服器中,每個伺服器內部亦具有BMC來進行監控管理。因此,是否可將BMC中的運算機制與雲端運算機制相互結合,讓伺服器能夠利用低成本的BMC晶片亦可達到高運算效能,便是研究人員可以研究的重要方向。With the rapid development of cloud computing technology, many research institutions and related application services gradually transfer complex computing and huge database from the local end to the cloud server connected by many servers, so that the network is smooth. In this case, the speed of the calculation and the reaction rate of the application service can be increased. In the current cloud server, each server also has a BMC for monitoring management. Therefore, whether the computing mechanism in the BMC and the cloud computing mechanism can be combined to enable the server to achieve high computing performance by using a low-cost BMC chip is an important direction that researchers can study.

本發明提供一種電腦系統的開機管理方法,使得遠端伺服器可利用雲端運算機制對電腦裝置進行開機管理流程,讓電腦裝置採用低運算效能的基板管理控制器來降低成本。The invention provides a booting management method for a computer system, so that the remote server can use the cloud computing mechanism to perform a power-on management process for the computer device, and the computer device uses a low-performance substrate management controller to reduce the cost.

此外,本發明提供一種電腦系統,此電腦系統中的遠端伺服器可利用雲端運算機制對受其管理的電腦裝置進行開機管理流程,讓電腦裝置可採用低運算效能的基板管理控制器來降低成本。In addition, the present invention provides a computer system in which a remote server in a computer system can perform a power-on management process on a computer device managed by the cloud computing mechanism, so that the computer device can be reduced by using a low-performance substrate management controller. cost.

本發明提出一種電腦系統的開機管理方法,其適用於電腦系統內的遠端伺服器中。電腦系統的開機管理方法包括下列步驟。當受遠端伺服器管理的電腦裝置為關機狀態時,便可經由網路來傳送一開機命令至電腦裝置,並重置一計數器以開始倒數一暫停時間。在此暫停時間倒數完畢之前,檢驗是否經由網路接收到電腦裝置所傳送的基本輸入輸出系統(Basic Input/Output System;BIOS)啟動完畢日誌信息。以及,當沒有接收到上述BIOS啟動完畢日誌信息,並且上述暫停時間已倒數完畢之後,便經由網路傳送一重新開機命令至上述電腦裝置,並再次重置上述計數器以使其重新開始倒數所述之暫停時間。The invention provides a booting management method for a computer system, which is suitable for use in a remote server in a computer system. The booting method of the computer system includes the following steps. When the computer device managed by the remote server is in the off state, a power-on command can be transmitted to the computer device via the network, and a counter is reset to start the countdown pause time. Before the countdown of the pause time is completed, it is checked whether the basic input/output system (BIOS) boot log information transmitted by the computer device is received via the network. And, when the BIOS startup log information is not received, and the pause time has been counted down, a reboot command is sent to the computer device via the network, and the counter is reset again to restart the countdown. The timeout.

在本發明之一實施例中,上述之開機管理方法更包括下列步驟。當在傳送上述開機命令時,遠端伺服器將一重新啟動旗標重置為0。當在傳送上述重新開機命令時,遠端伺服器便對上述重新啟動旗標加1。當重新啟動旗標等於一預設數目時,遠端伺服器便停止傳送上述重新開機命令並停止重置上述計數器,且執行一警示操作以告知電腦系統的維護人員。In an embodiment of the present invention, the foregoing power-on management method further includes the following steps. When transmitting the above power-on command, the remote server resets a restart flag to zero. When transmitting the above reboot command, the remote server increments the above restart flag by one. When the restart flag is equal to a preset number, the remote server stops transmitting the above reboot command and stops resetting the counter, and performs an alert operation to inform the maintenance personnel of the computer system.

在本發明之一實施例中,上述之開機管理方法更包括下列步驟。在上述暫停時間倒數完畢,並且檢驗是否接收到上述BIOS啟動完畢日誌信息之前,遠端伺服器應可檢驗是否接收到上述電腦裝置所傳送的一BIOS開始執行日誌信息。In an embodiment of the present invention, the foregoing power-on management method further includes the following steps. After the above pause time is counted down, and it is checked whether the above BIOS startup log information is received, the remote server should be able to check whether a BIOS start execution log message transmitted by the computer device is received.

在本發明之一實施例中,上述之開機管理方法更包括下列步驟。在上述暫停時間倒數完畢,並且檢驗是否接收到上述BIOS啟動完畢日誌信息之前,遠端伺服器應可檢驗是否接收到上述電腦裝置所傳送的一電源啟動日誌信息。In an embodiment of the present invention, the foregoing power-on management method further includes the following steps. After the above pause time is counted down, and it is checked whether the above BIOS startup log information is received, the remote server should be able to check whether a power boot log message transmitted by the computer device is received.

在本發明之一實施例中,上述之電腦裝置包括有一基版管理控制器(Baseboard Management Controller;BMC),其可接收上述開機命令以執行一開機操作。並且,BMC亦會將電腦裝置的每一個運作情況皆處理為系統事件日誌(System Event Log;SEL),並觸發一雲端傳輸事件以及時將每個系統事件日誌經由網路來傳送至遠端伺服器。此外,上述的系統事件日誌應可包括上述之BIOS開始執行日誌信息、BIOS啟動完畢日誌信息及電源啟動日誌信息。In an embodiment of the invention, the computer device includes a baseboard management controller (BMC) that can receive the boot command to perform a boot operation. In addition, the BMC will also process each operation of the computer device as a System Event Log (SEL), and trigger a cloud transmission event to transmit each system event log to the remote server via the network. Device. In addition, the above system event log may include the above-mentioned BIOS start execution log information, BIOS startup completion log information, and power startup log information.

從另一個角度而言,本發明提出一種電腦系統,其包括有至少一電腦裝置及一遠端伺服器。每個電腦裝置中皆包括有BMC,其可接收開機命令以執行開機操作,並且將電腦裝置上每一運作情況皆處理為系統事件日誌,並觸發一雲端傳輸事件以將上述系統事件日誌經由網路及時地傳送至遠端伺服器。Viewed from another aspect, the present invention provides a computer system including at least one computer device and a remote server. Each computer device includes a BMC, which can receive a boot command to perform a boot operation, and process each operation on the computer device as a system event log, and trigger a cloud transmission event to pass the system event log through the network. The path is transmitted to the remote server in a timely manner.

接續上述,當所述電腦裝置為關機狀態時,遠端伺服器可經由網路傳送開機命令至電腦裝置,並重置一計數器以開始倒數一暫停時間。在上述暫停時間倒數完畢之前,遠端伺服器便會檢驗是否經由網路來接收到此電腦裝置傳送的一BIOS啟動完畢日誌信息。如果在暫停時間已倒數完畢後還沒有接收到BIOS啟動完畢日誌信息的時候,遠端伺服器便經由網路傳送一重新開機命令至上述電腦裝置,並重置上述計數器以使其重新開始倒數暫停時間。In the above, when the computer device is in the off state, the remote server can transmit a power-on command to the computer device via the network, and reset a counter to start the countdown pause time. Before the above pause time is counted down, the remote server will check whether a BIOS startup log message transmitted by the computer device is received via the network. If the BIOS startup log message has not been received after the pause time has been counted down, the remote server transmits a reboot command to the computer device via the network, and resets the counter to restart the countdown pause. time.

接續上述,此電腦系統的其餘實施細節請參照上述說明,在此不多加贅述。Following the above, please refer to the above description for the remaining implementation details of this computer system, and I will not repeat them here.

基於上述,本發明實施例為了使電腦裝置採用低運算效能的BMC,本發明實施例的BMC便會持續將電腦裝置的系統管理日誌藉由網路來提供給遠端伺服器,讓BMC的開機管理機制皆利用雲端運算機制中的遠端伺服器進行處理,BMC本身不具判斷功能。因此,遠端伺服器便可在發送開機命令後,利用暫停時間(time-out)的倒數機制(或是俗稱的看門狗(watch dog)機制)來判斷電腦裝置的BIOS是否啟動完畢,藉以完成對電腦裝置的錯誤回復開機(FRB)機制。Based on the above, in order to make the computer device use the BMC with low computing performance, the BMC of the embodiment of the present invention continuously provides the system management log of the computer device to the remote server through the network, so that the BMC is powered on. The management mechanism is processed by the remote server in the cloud computing mechanism, and the BMC itself has no judgment function. Therefore, the remote server can use the time-out reciprocal mechanism (or the commonly known watch dog mechanism) to determine whether the BIOS of the computer device has been started after sending the power-on command. Complete the error recovery boot (FRB) mechanism for the computer device.

為讓本發明之上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明如下。The above described features and advantages of the present invention will be more apparent from the following description.

現將詳細參考本發明之示範性實施例,在附圖中說明所述示範性實施例之實例。另外,凡可能之處,在圖式及實施方式中使用相同標號的元件/構件/符號代表相同或類似部分。DETAILED DESCRIPTION OF THE INVENTION Reference will now be made in detail to the exemplary embodiments embodiments In addition, wherever possible, the elements and/

如圖1所示,圖1是根據本發明一實施例所述之電腦系統100的示意圖。電腦系統100包括有至少一個電腦裝置110、網路130及遠端伺服器140,並且每個電腦裝置110中亦包括有基板管理控制器(BMC) 120。電腦裝置110在此以多台伺服器為例,但本發明並不限制於此。於其他實施例中,可利用具有BMC 120的個人電腦來作為電腦裝置110的實現方式。As shown in FIG. 1, FIG. 1 is a schematic diagram of a computer system 100 in accordance with an embodiment of the present invention. The computer system 100 includes at least one computer device 110, a network 130, and a remote server 140, and each of the computer devices 110 also includes a substrate management controller (BMC) 120. The computer device 110 is exemplified herein by a plurality of servers, but the present invention is not limited thereto. In other embodiments, a personal computer with BMC 120 can be utilized as an implementation of computer device 110.

藉此,符合本實施例之基板管理控制器120及遠端伺服器140的功能架構則如圖2及圖3所示,圖2及圖3分別為電腦裝置110及遠端伺服器140的功能方塊圖。如圖2及圖3所示,為了使電腦裝置110可採用低運算效能的BMC 120,本發明實施例的BMC 120本身不具備判斷功能,讓BMC的判斷機制皆利用雲端運算機制中的遠端伺服器進行處理。Therefore, the functional architectures of the substrate management controller 120 and the remote server 140 in accordance with the present embodiment are as shown in FIG. 2 and FIG. 3, and FIG. 2 and FIG. 3 are functions of the computer device 110 and the remote server 140, respectively. Block diagram. As shown in FIG. 2 and FIG. 3, in order to enable the BMC 120 with low computing performance, the BMC 120 of the embodiment of the present invention does not have a judgment function, and the BMC judgment mechanism utilizes the remote end in the cloud computing mechanism. The server processes it.

詳細說明之,本實施例的BMC 120與以往BMC的不同之處在於,BMC 120並不包含以往BMC中所擁有的信息處理器(Message Handler) 50、平臺事件過濾器(Platform Event Filter,PEF) 260、錯誤回復開機(Fault Resilient Booting;FRB)模组280,甚至將警示處理模組270中的複雜判斷機制皆轉移至遠端伺服器140中,讓以往BMC 120的判斷機制皆利用遠端伺服器140以雲端運算機制的方式來進行處理。其中,FRB模組280中包括有一錯誤回復計數器(也就是,圖3中的計數器290),其於下述之開機管理方法中詳細描述。In detail, the BMC 120 of the present embodiment is different from the conventional BMC in that the BMC 120 does not include a Message Handler 50 and a Platform Event Filter (PEF) owned by the conventional BMC. 260. The Fault Resilient Booting (FRB) module 280 further transfers the complex judgment mechanism in the alert processing module 270 to the remote server 140, so that the judgment mechanism of the BMC 120 in the past utilizes the remote servo. The device 140 processes in the manner of a cloud computing mechanism. The FRB module 280 includes an error recovery counter (that is, the counter 290 in FIG. 3), which is described in detail in the power-on management method described below.

如圖2所示,電腦裝置110包括有BMC 120,而BMC 120本身僅保留有多個感測器210、事件接收器215、系統事件日誌(System Event Log,SEL)器220、雲端傳輸模組230及網路卡介面240。不同種類及用途的感測器210將會裝設於電腦裝置110各處。BMC 120利用感測器210或與其他介面相連的其他感測設備來偵測電腦裝置的運作情況,例如微處理器的溫度、風扇轉速...等,並將這些所有的運作情況皆傳送至事件接收器215以整理為系統事件日誌(SEL)。As shown in FIG. 2, the computer device 110 includes a BMC 120, and the BMC 120 itself retains only a plurality of sensors 210, an event receiver 215, a system event log (SEL) device 220, and a cloud transmission module. 230 and network card interface 240. Different types and uses of the sensor 210 will be installed throughout the computer device 110. The BMC 120 uses the sensor 210 or other sensing devices connected to other interfaces to detect the operation of the computer device, such as the temperature of the microprocessor, the fan speed, etc., and transmits all of the operations to the The event receiver 215 is organized into a system event log (SEL).

系統事件日誌器220則是將這些系統事件日誌進行儲存記錄。接著,BMC 120便在事件接收器215產生系統事件日誌後,立即觸發雲端傳輸模組230中的雲端傳輸事件,以利用雲端傳輸模組230及網路卡介面240將系統事件日誌經由網路130傳遞至圖1之遠端伺服器140。此外,BMC 120亦可以利用網路卡介面240及雲端傳輸模組230來接收並執行由圖1之遠端伺服器140傳送至電腦裝置110的程序命令。The system event loger 220 records these system event logs. Then, the BMC 120 immediately triggers the cloud transmission event in the cloud transmission module 230 after the event receiver 215 generates the system event log, so that the system event log is transmitted through the network 130 by using the cloud transmission module 230 and the network card interface 240. Passed to the remote server 140 of FIG. In addition, the BMC 120 can also use the network card interface 240 and the cloud transmission module 230 to receive and execute program commands transmitted by the remote server 140 of FIG. 1 to the computer device 110.

於其他實施例中,如果想要更為節省BMC 120的製作成本,亦可以將系統事件日誌器220設置於圖3之遠端伺服器140內,而不是設置於BMC 120中,使得BMC 120亦不需具備有系統事件日誌的儲存功能。In other embodiments, if it is desired to save the manufacturing cost of the BMC 120, the system event loger 220 can also be disposed in the remote server 140 of FIG. 3 instead of being disposed in the BMC 120, so that the BMC 120 is also There is no need to have a storage function with a system event log.

請參照圖3,遠端伺服器140的網路卡介面240及雲端傳輸模组230可從網路130中接收此電腦裝置110所傳送的系統事件日誌,在經過BMC種類的判斷流程、並且記錄BMC的來源網路位址之後,信號處理器250便會配合平臺事件過濾器260來分析上述系統事件日誌,進而判斷電腦裝置的運作情況是否正常。Referring to FIG. 3, the network card interface 240 and the cloud transmission module 230 of the remote server 140 can receive the system event log transmitted by the computer device 110 from the network 130, and pass the BMC type judgment process and record. After the source network address of the BMC, the signal processor 250 cooperates with the platform event filter 260 to analyze the system event log to determine whether the computer device is operating normally.

當判斷出系統不正常時(例如微處理器過熱、風扇無正常運轉等),信號處理器250及平臺事件過濾器260便會產生相對應的程序命令,並藉由網路卡介面240及雲端傳輸模组230將程序命令傳遞至對應之電腦裝置110的BMC 120中。其中,雲端傳輸模組230的雲端傳輸事件係利用軟體進行實現,但亦可以利用模組化方式以硬體來實現之,藉以降低開發成本,因此本發明並不限制於此。When it is determined that the system is abnormal (for example, the microprocessor is overheated, the fan is not operating normally, etc.), the signal processor 250 and the platform event filter 260 generate corresponding program commands, and through the network card interface 240 and the cloud. The transfer module 230 passes the program commands to the BMC 120 of the corresponding computer device 110. The cloud transmission event of the cloud transmission module 230 is implemented by using a software, but can also be implemented by using a modular method in hardware, thereby reducing development cost, and thus the present invention is not limited thereto.

遠端伺服器140亦可利用警示處理模組260來進行一警示操作,藉以通知維修人員對該台電腦裝置110進行處理。另一方面,遠端伺服器140亦可利用錯誤回復開機模組280來進行受其管理之電腦裝置110的開機管理/開機回復程序。The remote server 140 can also use the alert processing module 260 to perform an alert operation to notify the maintenance personnel to process the computer device 110. On the other hand, the remote server 140 can also use the error recovery boot module 280 to perform the boot management/boot reply procedure of the computer device 110 under its management.

特別說明的是,以往每個BMC在進行開機管理程序時,習知的作法是採用BMC當中的錯誤回復開機(Fault Resilient Booting;FRB)模組,藉由判斷基本輸入輸出系統(Basic Input/Output System;BIOS)是否在一暫停時間(time-out)內執行完畢與否,藉以達成開機管理/錯誤回復開機機制。但若將BMC 120的運算機制利用雲端運算機制來實現的同時,BMC 120當中便不會具有上述的FRB模組,因此便無法對電腦裝置110進行良好的開機管理程序。In particular, in the past, when each BMC was in the boot management process, the conventional practice was to use the Fault Resilient Booting (FRB) module in the BMC to determine the basic input/output system (Basic Input/Output). System; BIOS) Whether to complete or not in a time-out, in order to achieve the boot management / error recovery boot mechanism. However, if the computing mechanism of the BMC 120 is implemented by the cloud computing mechanism, the BMC 120 does not have the above FRB module, and thus the computer device 110 cannot be properly booted.

於此,本發明的精神在於,利用雲端運算機制將每個電腦裝置110中BMC 120的開機管理/錯誤回復開機機制功能挪至遠端伺服器140進行實現,藉以降低基板管理控制器的運作效能,從而減少BMC 120的建置成本。Therefore, the spirit of the present invention is to use the cloud computing mechanism to move the boot management/error recovery boot mechanism function of the BMC 120 in each computer device 110 to the remote server 140, thereby reducing the operational efficiency of the baseboard management controller. , thereby reducing the cost of building the BMC 120.

於此,在此提出一實施例以實現本發明的精神。圖4是根據本發明一實施例所述之電腦系統100的開機管理方法,其可適用於電腦系統100內的遠端伺服器140。此外,電腦裝置110的BMC 120亦需對應此開機管理方法而進行相對應的適合設計。Herein, an embodiment is proposed herein to achieve the spirit of the present invention. FIG. 4 is a schematic diagram of a booting method of a computer system 100 according to an embodiment of the invention, which is applicable to a remote server 140 in the computer system 100. In addition, the BMC 120 of the computer device 110 also needs to be correspondingly designed corresponding to the boot management method.

請以圖4配合圖1及圖3來說明之,在電腦系統100的開機管理方法中,計數器290當中已預設有一暫停時間(time-out)。若遠端伺服器140在所有受其管理的電腦裝置110的系統事件日誌中得知其中一個電腦裝置110為關機狀態,而此電腦裝置110應該為開機狀態時;或是,如果想要將已知以關機的電腦裝置110開啟時,便進入步驟S410,遠端伺服器140透過其雲端傳輸模組230及網路卡介面240以經由網路130來傳送一開機命令至電腦裝置110。4 and FIG. 3, in the booting management method of the computer system 100, a time-out is preset in the counter 290. If the remote server 140 knows that one of the computer devices 110 is in a power-off state in all system event logs of the computer device 110 under its management, and the computer device 110 should be powered on; or if When the shutdown computer device 110 is turned on, the process proceeds to step S410. The remote server 140 transmits a power-on command to the computer device 110 via the network 130 through the cloud transmission module 230 and the network card interface 240.

並且,於本實施例的步驟S410中,錯誤回復開機模組280中的計數器290將會被重置,並開始將預先設定的暫停時間進行倒數。此外,錯誤回復開機模組280也會同時將一重新啟動旗標Frst重置為0。於其他實施例中,計數器290亦可透過看門狗計數器(Watch Dog Timer)作為其實現方式,在此不再贅述。Moreover, in step S410 of the embodiment, the counter 290 in the error recovery power-on module 280 will be reset and start counting down the preset pause time. In addition, the error recovery boot module 280 also resets a restart flag Frst to zero. In other embodiments, the counter 290 can also be implemented by using a watchdog timer (Watch Dog Timer), and details are not described herein again.

於步驟S420中,遠端伺服器140的錯誤回復開機模組280便會持續判斷暫停時間是否倒數完畢。在暫停時間倒數完畢之前,遠端伺服器140將會持續檢驗是否經由網路130來接收到電腦裝置110所傳送的基本輸入輸出系統(BIOS)啟動完畢日誌信息(步驟S430)。於本實施例中,在此所指的BIOS啟動完畢日誌信息為系統事件日誌(SEL)的一種,是在電腦裝置110中的BIOS將其電源啟動自我測試(Power-On Self-Test;POST)程序執行完成時將會產生的系統事件日誌。In step S420, the error recovery boot module 280 of the remote server 140 continuously determines whether the pause time is counted down. Before the timeout of the pause time is completed, the remote server 140 will continuously check whether the basic input/output system (BIOS) startup completion log information transmitted by the computer device 110 is received via the network 130 (step S430). In this embodiment, the BIOS startup log information referred to herein is a type of system event log (SEL), and the BIOS in the computer device 110 initiates a power-on self-test (POST). A system event log that will be generated when the program execution is complete.

正常而言,遠端伺服器140如果在暫停時間倒數完畢之前收到上述BIOS啟動完畢日誌信息,便表示電腦裝置110已開機完畢,因此便進入步驟S440,遠端伺服器140便會在電腦裝置110的狀態中註記其為開機狀態。Normally, if the remote server 140 receives the BIOS startup log information before the pause time is counted down, it indicates that the computer device 110 has been powered on, so the process proceeds to step S440, and the remote server 140 is at the computer device. The state of 110 is noted as being powered on.

然而,不幸地,如果並沒有接收到電腦裝置110的BIOS啟動完畢日誌信息,而步驟S420中的暫停時間已倒數完畢後,便由步驟S420進入步驟S450,錯誤回復開機模組280先行檢驗其重新啟動旗標Frst是否超過一預定數目的重新啟動次數,使遠端伺服器140不必一直對該電腦裝置110持續不斷地進行重新開機動作。於本實施例中,上述的預定數目可為5,但不限制於此。However, unfortunately, if the BIOS startup log information of the computer device 110 is not received, and the pause time in step S420 has been counted down, the process proceeds from step S420 to step S450, and the error recovery boot module 280 checks the restart. Whether the startup flag Frst exceeds a predetermined number of restarts, so that the remote server 140 does not have to continuously restart the computer device 110. In the present embodiment, the predetermined number described above may be 5, but is not limited thereto.

如果重新啟動旗標Frst並沒有等於或是超過上述的預定數目,便由步驟S450進入步驟S460,遠端伺服器140經由網路130傳送一重新開機命令、或是再次傳送一開機命令至對應的電腦裝置110,錯誤回復開機模組280亦再次將計數器290進行重置,以使其回到步驟S420,並且重新開始倒數該暫停時間。此外,錯誤回復開機模組280於此同時並且對重新啟動旗標Frst加1,藉以計數遠端伺服器140對該電腦裝置110進行重新開機動作的次數。If the restart flag Frst does not equal or exceed the predetermined number, the process proceeds from step S450 to step S460, and the remote server 140 transmits a reboot command via the network 130 or transmits a power-on command to the corresponding one. The computer device 110, the error recovery boot module 280 also resets the counter 290 again to return to step S420, and restarts the countdown pause time. In addition, the error recovery boot module 280 simultaneously adds 1 to the restart flag Frst to count the number of times the remote server 140 reboots the computer device 110.

相對地,如果重新啟動旗標Frst已等於或是超過上述的預定數目時,表示已對該電腦裝置110進行重新開機的次數已達到預定的次數。因此,便從步驟S450進入步驟S470,遠端伺服器140便停止傳送上述的重新開機命令,錯誤回復開機模組280於此同時亦停止重置計數器290。In contrast, if the restart flag Frst has equaled or exceeded the predetermined number, it indicates that the number of times the computer device 110 has been rebooted has reached a predetermined number of times. Therefore, proceeding from step S450 to step S470, the remote server 140 stops transmitting the above-mentioned restart command, and the error recovery boot module 280 also stops resetting the counter 290.

並且,在步驟S470中,遠端伺服器140便會利用警示處理模組270來執行一警示操作,以對電腦系統100之維護人員自動告知此項信息,並請求維護人員的協助。上述警示操作可以是:發送特定的警示信息於遠端伺服器140的螢幕上、傳送特定的警示封包至特定伺服器或發出特定的警示聲響等,本發明應不能受限於上述舉例中。Moreover, in step S470, the remote server 140 uses the alert processing module 270 to perform an alert operation to automatically inform the maintenance personnel of the computer system 100 of the information and request assistance from the maintenance personnel. The above warning operation may be: sending specific warning information on the screen of the remote server 140, transmitting a specific warning packet to a specific server or issuing a specific warning sound, etc., and the present invention should not be limited to the above examples.

彚整上述說明,在每個電腦裝置110中之BMC 120均將其開機管理/錯誤回復開機機制的相關模組刪除時,本發明實施例的遠端伺服器140可透過其中的錯誤回復開機模組280來對每台電腦裝置110達到以往BMC同樣的效果。與以往的電腦系統相較,本發明實施例更可利用低運算效能的BMC來達成原本高運算效能之BMC的相關機制,因此可以更為降低每個電腦裝置110的架設成本。在此特別說明的是,只要錯誤回復開機模組280中的計數器290具有足夠的數量,遠端伺服器140便可同時對多台受其管理的電腦裝置110同時進行上述的開機管理方法/錯誤回復開機機制。In the above description, when the BMC 120 in each of the computer devices 110 deletes the relevant module of the boot management/error recovery booting mechanism, the remote server 140 of the embodiment of the present invention can transmit the error recovery boot mode. Group 280 achieves the same effect as previous BMC for each computer device 110. Compared with the conventional computer system, the embodiment of the present invention can utilize the BMC with low computational efficiency to achieve the related mechanism of the BMC with high computational efficiency, so that the installation cost of each computer device 110 can be further reduced. Specifically, as long as the counter 290 in the error recovery boot module 280 has a sufficient number, the remote server 140 can simultaneously perform the above boot management method/error for a plurality of computer devices 110 managed by the remote server 140. Reply to the boot mechanism.

圖5是根據本發明另一實施例所述之電腦系統100的開機管理方法。本實施例與上述圖4中所繪示之開機管理方法相類似,其相同或類似的描述在此不再贅述。本實施例與圖4之實施例的不同處在於,由於以往的BMC在其開機管理方法/錯誤回復開機機制中,除了判斷BIOS啟動完畢日誌信息以外,亦需對其他與錯誤回復開機機制有關的系統事件信息進行判斷,例如:電腦裝置110的電源啟動日誌信息(於電腦裝置110通電時產生的系統事件日誌)、及BIOS開始執行日誌信息(於電腦裝置110中BIOS開始執行時時產生的系統事件日誌)。FIG. 5 is a diagram showing a booting management method of a computer system 100 according to another embodiment of the present invention. This embodiment is similar to the power-on management method shown in FIG. 4 above, and the same or similar descriptions are not described herein again. The difference between this embodiment and the embodiment of FIG. 4 is that, in addition to determining the BIOS startup log information, the previous BMC needs to be related to other error recovery startup mechanisms in the boot management method/error recovery startup mechanism. The system event information is determined, for example, the power-on log information of the computer device 110 (the system event log generated when the computer device 110 is powered on), and the BIOS start executing the log information (the system generated when the BIOS starts executing in the computer device 110) Event log).

因此,於圖5的步驟S530中,在暫停時間倒數完畢之前,遠端伺服器140將會持續檢驗是否經由網路130依序接收到電腦裝置110所傳送的電源啟動日誌信息(亦即,先行檢驗電腦裝置110是否通電),然後繼續檢驗是否接收BIOS開始執行日誌信息(亦即,檢驗電腦裝置110的BIOS是否開始執行),最後才檢驗是否接收上述的BIOS啟動完畢日誌信息(亦即,檢驗電腦裝置110的BIO是否執行完畢)。若有依據上述的順序來接收到這些系統事件日誌,那麼便進入步驟S440以認定電腦裝置110已開機完成。相對地,如果沒有依據上述順序接收到這些系統事件日誌,表示電腦裝置110在其開機程序中有產生錯誤,因此便進入步驟S450~S470以繼續進行錯誤回復開機機制。Therefore, in step S530 of FIG. 5, before the pause time countdown is completed, the remote server 140 will continuously check whether the power boot log information transmitted by the computer device 110 is sequentially received via the network 130 (ie, first. Verifying whether the computer device 110 is powered on), and then continuing to check whether the receiving BIOS starts executing log information (ie, checking whether the BIOS of the computer device 110 is started), and finally checking whether to receive the above BIOS startup log information (ie, verifying Whether the BIO of the computer device 110 is completed or not). If the system event logs are received in the order described above, then the process proceeds to step S440 to confirm that the computer device 110 has been powered on. In contrast, if the system event logs are not received in the above order, it indicates that the computer device 110 has generated an error in its booting procedure, so the process proceeds to steps S450 to S470 to continue the error recovery booting mechanism.

綜合上述,本發明實施例為了使電腦裝置可採用低運算效能的BMC,本發明實施例的BMC便會持續將電腦裝置的系統管理日誌藉由網路來提供給遠端伺服器,讓BMC的開機管理機制皆利用雲端運算機制中的遠端伺服器進行處理,BMC本身不具判斷功能。因此,遠端伺服器便可在發送開機命令後,利用暫停時間(time-out)的倒數機制(或是俗稱的看門狗(watch dog)機制)來判斷電腦裝置的BIOS是否啟動完畢,藉以完成對電腦裝置的開機管理機制(或稱為,錯誤回復開機(FRB)機制)。In summary, in the embodiment of the present invention, in order to enable the computer device to adopt a low-performance BMC, the BMC of the embodiment of the present invention continuously provides the system management log of the computer device to the remote server through the network, so that the BMC The boot management mechanism is processed by the remote server in the cloud computing mechanism, and the BMC itself has no judgment function. Therefore, the remote server can use the time-out reciprocal mechanism (or the commonly known watch dog mechanism) to determine whether the BIOS of the computer device has been started after sending the power-on command. Complete the boot management mechanism (or called the error recovery boot (FRB) mechanism) for the computer device.

雖然本發明已以實施例揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明之精神和範圍內,當可作些許之更動與潤飾,故本發明之保護範圍當視後附之申請專利範圍所界定者為準。Although the present invention has been disclosed in the above embodiments, it is not intended to limit the invention, and any one of ordinary skill in the art can make some modifications and refinements without departing from the spirit and scope of the invention. The scope of the invention is defined by the scope of the appended claims.

100...電腦系統100. . . computer system

110...電腦裝置110. . . Computer device

120...基板管理控制器(BMC)120. . . Baseboard Management Controller (BMC)

130...網路130. . . network

140...遠端伺服器140. . . Remote server

210...感測器210. . . Sensor

215...事件接收器215. . . Event receiver

220...系統事件日誌器220. . . System event logger

230...雲端傳輸模組230. . . Cloud transmission module

240...網路卡介面240. . . Network card interface

250...信號處理器250. . . Signal processor

260...平臺事件過濾器260. . . Platform event filter

270...警示處理模組270. . . Warning processing module

280...錯誤回復開機模組280. . . Error recovery boot module

290...計數器290. . . counter

S410~S530...步驟S410~S530. . . step

圖1是根據本發明一實施例所述之電腦系統的示意圖。1 is a schematic diagram of a computer system in accordance with an embodiment of the invention.

圖2為電腦裝置的功能方塊圖。2 is a functional block diagram of a computer device.

圖3為遠端伺服器的功能方塊圖。Figure 3 is a functional block diagram of the remote server.

圖4是根據本發明一實施例所述之電腦系統的開機管理方法。FIG. 4 is a diagram showing a method for booting up a computer system according to an embodiment of the invention.

圖5是根據本發明另一實施例所述之電腦系統的開機管理方法。FIG. 5 is a diagram showing a method for booting up a computer system according to another embodiment of the present invention.

S410~S530...步驟S410~S530. . . step

Claims (10)

一種電腦系統的開機管理方法,適用於一遠端伺服器,該電腦系統的開機管理方法包括:當受遠端伺服器管理的一電腦裝置為一關機狀態時,經由一網路傳送一開機命令至該電腦裝置,並重置該遠端伺服器中的一計數器以開始倒數一暫停時間;在該暫停時間倒數完畢之前,檢驗是否經由該網路接收到該電腦裝置傳送的一BIOS啟動完畢日誌信息;以及當沒有接收到該BIOS啟動完畢日誌信息,且該暫停時間已倒數完畢後,經由該網路傳送一重新開機命令至該電腦裝置,並重置該計數器以使其重新開始倒數該暫停時間。A booting management method for a computer system is applicable to a remote server, and the booting management method of the computer system includes: when a computer device managed by the remote server is in a shutdown state, transmitting a boot command via a network Go to the computer device and reset a counter in the remote server to start the countdown pause time; before the countdown is completed, check whether a BIOS transmitted by the computer device is received via the network. Log information; and when the BIOS startup log information is not received, and the pause time has been counted down, a reboot command is transmitted to the computer device via the network, and the counter is reset to restart the countdown Pause time. 如申請專利範圍第1項所述之電腦系統的開機管理方法,更包括:當在傳送該開機命令時,重置一重新啟動旗標為0;當在傳送該重新開機命令時,對該重新啟動旗標加1;以及當該重新啟動旗標等於一預設數目時,停止傳送該重新開機命令並停止重置該計數器,且執行一警示操作。The booting management method of the computer system according to claim 1, further comprising: resetting a restart flag to 0 when transmitting the power-on command; and re-setting the restart command when transmitting the power-on command The startup flag is incremented by one; and when the restart flag is equal to a preset number, the transmission of the restart command is stopped and the resetting of the counter is stopped, and an alert operation is performed. 如申請專利範圍第1項所述之電腦系統的開機管理方法,更包括:在該暫停時間倒數完畢,且檢驗是否接收到該BIOS啟動完畢日誌信息之前,檢驗是否接收到該電腦裝置傳送的一BIOS開始執行日誌信息。The method for controlling the booting of the computer system according to claim 1, further comprising: checking whether the computer device transmits the received information after receiving the BIOS startup log information after the timeout is completed. The BIOS begins to execute log information. 如申請專利範圍第1項所述之電腦系統的開機管理方法,更包括:在該暫停時間倒數完畢,且檢驗是否接收到該BIOS啟動完畢日誌信息之前,檢驗是否接收到該電腦裝置傳送的一電源啟動日誌信息。The method for controlling the booting of the computer system according to claim 1, further comprising: checking whether the computer device transmits the received information after receiving the BIOS startup log information after the timeout is completed. Power boot log information. 如申請專利範圍第1項所述之電腦系統的開機管理方法,其中該電腦裝置包括:一基版管理控制器,其接收該開機命令以執行一開機操作,且將該電腦裝置的每一運作情況處理為至少一系統事件日誌,並觸發一雲端傳輸事件以將所述系統事件日誌經由該網路傳送至該遠端伺服器。The computer system boot management method according to claim 1, wherein the computer device comprises: a base management controller, which receives the boot command to perform a boot operation, and each operation of the computer device The situation is processed into at least one system event log and triggers a cloud transmission event to communicate the system event log to the remote server via the network. 如申請專利範圍第5項所述之電腦系統的開機管理方法,其中該系統事件日誌包括一BIOS開始執行日誌信息及該BIOS啟動完畢日誌信息。The computer system boot management method of claim 5, wherein the system event log includes a BIOS start execution log information and the BIOS startup log information. 一種電腦系統,包括:至少一電腦裝置,包括:一基板管理控制器,其接收一開機命令以執行一開機操作,且將該電腦裝置的每一運作情況處理為至少一系統事件日誌,並觸發一雲端傳輸事件以將所述系統事件日誌經由一網路傳送至該遠端伺服器;以及一遠端伺服器,其包括一計數器,其中,當受遠端伺服器管理的該電腦裝置為一關機狀態時,該遠端伺服器經由該網路傳送一開機命令至該電腦裝置,並重置該計數器以開始倒數一暫停時間;在該暫停時間倒數完畢之前,該遠端伺服器檢驗是否經由該網路接收到該電腦裝置傳送的一BIOS啟動完畢日誌信息;當沒有接收到該BIOS啟動完畢日誌信息,且該暫停時間已倒數完畢後,該遠端伺服器經由該網路傳送一重新開機命令至該電腦裝置,並重置該計數器以使其重新開始倒數該暫停時間。A computer system comprising: at least one computer device, comprising: a substrate management controller, receiving a power-on command to perform a power-on operation, and processing each operation condition of the computer device as at least one system event log, and triggering a cloud transmission event to transmit the system event log to the remote server via a network; and a remote server including a counter, wherein the computer device managed by the remote server is a In the shutdown state, the remote server transmits a power-on command to the computer device via the network, and resets the counter to start a countdown pause time; before the pause time is counted down, the remote server checks whether Receiving, by the network, a BIOS startup log information transmitted by the computer device; when the BIOS startup log information is not received, and the pause time has been counted down, the remote server transmits a re-transmission via the network. Power on the command to the computer device and reset the counter to restart the countdown. 如申請專利範圍第7項所述之電腦系統,其中該遠端伺服器在傳送該開機命令時更重置一重新啟動旗標為0,且在傳送該重新開機命令時,該遠端伺服器更對該重新啟動旗標加1,當該重新啟動旗標等於一預設數目時,該遠端伺服器停止傳送該重新開機命令,停止重置該計數器,並執行一警示操作。The computer system of claim 7, wherein the remote server resets a restart flag to 0 when transmitting the power-on command, and the remote server transmits the restart command when transmitting the power-on command The restart flag is further incremented by 1. When the restart flag is equal to a preset number, the remote server stops transmitting the restart command, stops resetting the counter, and performs an alert operation. 如申請專利範圍第8項所述之電腦系統,其中在該暫停時間倒數完畢,且檢驗是否接收到該BIOS啟動完畢日誌信息之前,該遠端伺服器檢驗是否接收到該電腦裝置傳送的一BIOS開始執行日誌信息。The computer system of claim 8, wherein the remote server checks whether a BIOS transmitted by the computer device is received before the timeout is completed and the BIOS startup log information is received. Start executing log information. 如申請專利範圍第8項所述之電腦系統,其中在該暫停時間倒數完畢,且檢驗是否接收到該BIOS啟動完畢日誌信息之前,該遠端伺服器檢驗是否接收到該電腦裝置傳送的一電源啟動日誌信息。The computer system of claim 8, wherein the remote server checks whether a power source transmitted by the computer device is received before the timeout is completed and the BIOS startup log information is received. Start log information.
TW100146546A 2011-12-15 2011-12-15 Computer system and boot managing method of computer system TW201324115A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW100146546A TW201324115A (en) 2011-12-15 2011-12-15 Computer system and boot managing method of computer system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW100146546A TW201324115A (en) 2011-12-15 2011-12-15 Computer system and boot managing method of computer system

Publications (1)

Publication Number Publication Date
TW201324115A true TW201324115A (en) 2013-06-16

Family

ID=49032944

Family Applications (1)

Application Number Title Priority Date Filing Date
TW100146546A TW201324115A (en) 2011-12-15 2011-12-15 Computer system and boot managing method of computer system

Country Status (1)

Country Link
TW (1) TW201324115A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI576706B (en) * 2014-11-13 2017-04-01 惠普發展公司有限責任合夥企業 Method for early boot phase and the related device
TWI607314B (en) * 2016-06-24 2017-12-01 神雲科技股份有限公司 Chassis Device
US10404559B2 (en) 2015-07-17 2019-09-03 Dataprobe Inc. Apparatus and system for automatically rebooting an electronically powered device via power over ethernet

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI576706B (en) * 2014-11-13 2017-04-01 惠普發展公司有限責任合夥企業 Method for early boot phase and the related device
US10430202B2 (en) 2014-11-13 2019-10-01 Hewlett Packard Enterprise Development Lp Dual purpose boot registers
US10404559B2 (en) 2015-07-17 2019-09-03 Dataprobe Inc. Apparatus and system for automatically rebooting an electronically powered device via power over ethernet
TWI607314B (en) * 2016-06-24 2017-12-01 神雲科技股份有限公司 Chassis Device

Similar Documents

Publication Publication Date Title
JP6383839B2 (en) Method, storage device and system used for remote KVM session
US8719410B2 (en) Native bi-directional communication for hardware management
US10698788B2 (en) Method for monitoring server, and monitoring device and monitoring system using the same
WO2015039598A1 (en) Fault locating method and device
WO2015196365A1 (en) Fault processing method, related device and computer
US9021317B2 (en) Reporting and processing computer operation failure alerts
TWI632462B (en) Switching device and method for detecting i2c bus
US20120136970A1 (en) Computer system and method for managing computer device
US20080140895A1 (en) Systems and Arrangements for Interrupt Management in a Processing Environment
CN104639380A (en) Server monitoring method
TWI512490B (en) System for retrieving console messages and method thereof and non-transitory computer-readable medium
WO2022100307A1 (en) Method and apparatus for data interaction between server bios and bmc, and device
TWI668567B (en) Server and method for restoring a baseboard management controller automatically
US10936324B2 (en) Proactive host device access monitoring and reporting system
TW201415213A (en) Self-test system and method thereof
CN114600088A (en) Server state monitoring system and method using baseboard management controller
TW200426571A (en) Policy-based response to system errors occurring during os runtime
WO2020010890A1 (en) Method and system for monitoring resource utilization rate of server cpu based on bmc
JP5425720B2 (en) Virtualization environment monitoring apparatus and monitoring method and program thereof
CN103178977A (en) Computer system and starting-up management method of same
US20080288828A1 (en) structures for interrupt management in a processing environment
TW201324115A (en) Computer system and boot managing method of computer system
KR102438148B1 (en) Abnormality detection apparatus, system and method for detecting abnormality of embedded computing module
US20140164650A1 (en) System, method and computer program product for monitoring and alerting the health of sub-system connectors
CN107179911A (en) A kind of method and apparatus for restarting management engine