TW201913407A - ARM-based server and management method thereof - Google Patents

ARM-based server and management method thereof Download PDF

Info

Publication number
TW201913407A
TW201913407A TW106130901A TW106130901A TW201913407A TW 201913407 A TW201913407 A TW 201913407A TW 106130901 A TW106130901 A TW 106130901A TW 106130901 A TW106130901 A TW 106130901A TW 201913407 A TW201913407 A TW 201913407A
Authority
TW
Taiwan
Prior art keywords
arm
peripheral device
event message
arm processor
management controller
Prior art date
Application number
TW106130901A
Other languages
Chinese (zh)
Other versions
TWI635401B (en
Inventor
王紹宇
孫佩傑
Original Assignee
技嘉科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 技嘉科技股份有限公司 filed Critical 技嘉科技股份有限公司
Priority to TW106130901A priority Critical patent/TWI635401B/en
Application granted granted Critical
Publication of TWI635401B publication Critical patent/TWI635401B/en
Publication of TW201913407A publication Critical patent/TW201913407A/en

Links

Abstract

An ARM-based server and a management method thereof. The ARM-based server includes at least one peripheral device, a baseboard management controller (BMC), and an ARM processor including an ARM trusted firmware (ATF). The BMC is configured to monitor and determine whether the at least one peripheral device and the ARM processor is abnormal, and to generate event information according to a determination result, where the event information corresponds to the ARM processor or one of the peripheral device. The ATF is configured to receive the event information from the BMC, and to perform an event handling operation on the ARM processor or the peripheral device corresponding to the event information. In addition, a management method of an ARM-based server is also provided.

Description

ARM架構伺服器及其管理方法ARM architecture server and management method thereof

本發明是有關於一種伺服器管理方法,且特別是有關於一種可自動排除障礙的ARM架構伺服器及其管理方法。The present invention relates to a server management method, and in particular to an ARM architecture server capable of automatically removing obstacles and a management method thereof.

基板管理控制器(Baseboard Management Controller,BMC)是用於管理伺服器系統。一般而言,為了監控電腦系統的內部運作是否正常,使用者可利用配置於主機板上的基板管理控制器來檢測電腦系統。常見的方式包括遠端控制基板管理控制器,以偵測電腦系統中用以感測各個元件運作情況的各個感測器的讀值(例如:風扇的轉速或者處理器的溫度等)。當使用者發現感測器讀值出現異常時,必須親自到現場對伺服器進行修復(例如,零件的替換等)。然而,過長的反應時間可能導致伺服器發生異常之後造成更嚴重的毀損與資料的遺失。因此,為了維持伺服器的正常運作與良好的服務,在感測器讀值異常後過長的反應時間是不被允許的。The Baseboard Management Controller (BMC) is used to manage the server system. In general, in order to monitor whether the internal operation of the computer system is normal, the user can use the substrate management controller disposed on the motherboard to detect the computer system. Common methods include remotely controlling the baseboard management controller to detect the readings of the various sensors in the computer system that sense the operation of the various components (eg, fan speed or processor temperature, etc.). When the user finds that the sensor reading is abnormal, the server must be repaired in person (for example, replacement of parts, etc.). However, too long a reaction time may cause more serious damage and loss of data after an abnormality occurs in the server. Therefore, in order to maintain the normal operation of the server and good service, too long reaction time after the sensor reading abnormality is not allowed.

本發明提供一種ARM架構伺服器及其管理方法,能夠在BMC檢測到元件出現異常時自動進行修復,因而能夠使伺服器不中斷地正常運作。The present invention provides an ARM architecture server and a management method thereof, which can automatically perform repair when the BMC detects an abnormality of a component, thereby enabling the server to operate normally without interruption.

本發明提出一種ARM架構伺服器,包括至少一個周邊裝置、基板管理控制器以及ARM處理器。基板管理控制器耦接於所述至少一個周邊裝置,用以監控並判斷至少一個周邊裝置以及ARM處理器是否發生異常,並且依據判斷結果產生對應於ARM處理器或其中一個周邊裝置的事件訊息。ARM處理器耦接於所述至少一個周邊裝置以及基板管理控制器,其中包括ARM可信賴韌體(ARM Trusted Firmware,ATF)。ARM可信賴韌體用以接收來自基板管理控制器的事件訊息,並且用以對事件訊息所對應的ARM處理器或周邊裝置執行事件處理操作。The present invention provides an ARM architecture server including at least one peripheral device, a substrate management controller, and an ARM processor. The substrate management controller is coupled to the at least one peripheral device for monitoring and determining whether the at least one peripheral device and the ARM processor are abnormal, and generating an event message corresponding to the ARM processor or one of the peripheral devices according to the determination result. The ARM processor is coupled to the at least one peripheral device and the baseboard management controller, including an ARM Trusted Firmware (ATF). The ARM trusted firmware is used to receive event messages from the baseboard management controller and to perform event processing operations on the ARM processor or peripheral devices corresponding to the event messages.

從一另觀點而言,本發明提出一種ARM架構伺服器的管理方法。ARM架構伺服器包括至少一個周邊裝置、基板管理控制器以及ARM處理器。所述管理方法包括:基板管理控制器監控並判斷所述至少一個周邊裝置以及ARM處理器是否發生異常;基板管理控制器依據判斷結果產生對應於ARM處理器或其中一個周邊裝置的事件訊息;基板管理控制器傳送事件訊息至ARM處理器;以及藉由ARM處理器中的ARM可信賴韌體,對事件訊息所對應的ARM處理器或周邊裝置執行事件處理操作。From another point of view, the present invention proposes a management method of an ARM architecture server. The ARM architecture server includes at least one peripheral device, a substrate management controller, and an ARM processor. The management method includes: the substrate management controller monitors and determines whether the at least one peripheral device and the ARM processor are abnormal; and the substrate management controller generates an event message corresponding to the ARM processor or one of the peripheral devices according to the determination result; The management controller transmits the event message to the ARM processor; and performs an event processing operation on the ARM processor or peripheral device corresponding to the event message by using the ARM trusted firmware in the ARM processor.

在本發明的一實施例中,上述的事件訊息對應於ARM處理器,並且事件處理操作包括調整ARM處理器的工作頻率。In an embodiment of the invention, the event message corresponds to an ARM processor, and the event processing operation includes adjusting an operating frequency of the ARM processor.

在本發明的一實施例中,上述的周邊裝置包括具有至少兩個記憶體通道的記憶體裝置,上述的事件訊息對應於其中一個記憶體通道,並且事件處理操作包括關閉事件訊息所對應的記憶體通道。In an embodiment of the invention, the peripheral device includes a memory device having at least two memory channels, wherein the event message corresponds to one of the memory channels, and the event processing operation includes closing the memory corresponding to the event message. Body channel.

在本發明的一實施例中,上述的周邊裝置包括PCI-E裝置,事件訊息對應於PCI-E裝置,並且事件處理操作包括執行PCI-E重置。In an embodiment of the invention, the peripheral device comprises a PCI-E device, the event message corresponds to a PCI-E device, and the event processing operation comprises performing a PCI-E reset.

在本發明的一實施例中,上述的ARM架構伺服器包括多個例外層級,其中ARM架構伺服器的作業系統運行於第一例外層級,並且ARM可信賴韌體運行於不低於第一例外層級的第二例外層級。In an embodiment of the invention, the ARM architecture server includes a plurality of exception levels, wherein the operating system of the ARM architecture server runs at the first exception level, and the ARM trusted firmware runs at no less than the first exception. The second exception level of the hierarchy.

基於上述,本發明實施例所提出的ARM架構伺服器及其管理方法,基板管理控制器將異常事件通知ARM可信賴韌體,並且藉由ARM可信賴韌體來直接對發生異常的裝置進行處理。如此一來,使用者無須在作業系統安裝監控程式,便能夠及時的對ARM伺服器進行修復,也能同時兼顧安全性。Based on the foregoing, the ARM architecture server and the management method thereof are provided in the embodiment of the present invention, and the substrate management controller notifies the ARM trusted firmware of the abnormal event, and directly processes the device that generates the abnormality by using the ARM trusted firmware. . In this way, the user can repair the ARM server in time without installing the monitoring program in the operating system, and at the same time, the security can be considered at the same time.

為讓本發明的上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明如下。The above described features and advantages of the invention will be apparent from the following description.

圖1繪示本發明一實施例的ARM架構伺服器的概要方塊圖。請參照圖1,本發明實施例的ARM架構伺服器100包括基板管理控制器110、至少一個周邊裝置120,以及ARM處理器130,其中基板管理控制器110 以及ARM處理器130皆耦接於各周邊裝置120。特別是,基板管理控制器110亦耦接於ARM處理器。FIG. 1 is a schematic block diagram of an ARM architecture server according to an embodiment of the invention. The ARM architecture server 100 of the embodiment of the present invention includes a substrate management controller 110, at least one peripheral device 120, and an ARM processor 130. The substrate management controller 110 and the ARM processor 130 are coupled to each. Peripheral device 120. In particular, the substrate management controller 110 is also coupled to the ARM processor.

在一實施例中,ARM架構伺服器100例如但不限於是ARMv8-A架構,其中包括多個例外層級(Exception levels),而例外層級越高表示存取權限(privilege)越高。舉例而言,ARM架構伺服器100包括EL0到EL3四個例外層級,其中,EL0為無特權層級(unprivileged),EL1為作業系統核心模式(OS kernel mode),EL2為虛擬機器監視器層級(Hypervisor mode),而EL3為TrustZone® 監視層級(TrustZone® monitor mode)。In an embodiment, the ARM architecture server 100 is, for example but not limited to, an ARMv8-A architecture that includes multiple Exception levels, while a higher exception level indicates a higher privilege. For example, the ARM architecture server 100 includes four exception levels of EL0 to EL3, where EL0 is unprivileged, EL1 is OS kernel mode, and EL2 is virtual machine monitor level (Hypervisor) mode), and EL3 is TrustZone ® monitoring level (TrustZone ® monitor mode).

基板管理控制器110例如是透過智慧平台管理匯流排(Intelligent Platform Management Bus,IPMB)與各周邊裝置120中相連接,以監控各周邊裝置120。在一實施例中,周邊裝置120包括監控風扇轉速或處理器溫度等的感測器、雙通道(Dual-channel)雙倍資料率同步動態隨機存取記憶體(Double Data Rate Synchronous Dynamic Random Access Memory,DDR SDRAM),以及PCI-E乙太網路(Ethernet)卡等元件,但本發明並不限於此。關於基板管理控制器110與其監控伺服器各項周邊裝置120的相關說明,所屬領域具備通常知識者當可從習知技術中獲致足夠的教示,故在此不再贅述。The baseboard management controller 110 is connected to each peripheral device 120 via, for example, an Intelligent Platform Management Bus (IPMB) to monitor each peripheral device 120. In an embodiment, the peripheral device 120 includes a sensor for monitoring fan speed or processor temperature, and a dual-channel dual data rate synchronous dynamic random access memory (Double Data Rate Synchronous Dynamic Random Access Memory). , DDR SDRAM), and components such as a PCI-E Ethernet card, but the present invention is not limited thereto. Regarding the description of the substrate management controller 110 and its monitoring server peripheral devices 120, those skilled in the art will be able to obtain sufficient teachings from the prior art, and thus will not be described herein.

ARM處理器130是以精簡指令集(Reduced Instruction Set Computing,RISC)架構設計的處理器,例如為ARM Cortex-A、ARM Cortex-M、Cortex-A50系列或CortexA-73的處理器等,但本發明並不在此限。The ARM processor 130 is a processor designed in a Reduced Instruction Set Computing (RISC) architecture, such as an ARM Cortex-A, an ARM Cortex-M, a Cortex-A50 series, or a Cortex A-73 processor, but The invention is not limited to this.

在一實施例中,ARM處理器130包括ARM可信賴韌體(ARM Trusted Firmware,ATF)131,用以提供ATF服務。值得一提的是,ARM可信賴韌體131是運行於不低於作業系統的例外層級。舉例來說,ARM伺服器100的作業系統例如是運行於第一例外層級(例如,EL1),而ARM可信賴韌體131則是運行於不低於第一例外層級的第二例外層級(例如,EL3)。因此,ARM可信賴韌體131可以存取所有ARM處理器130本身以及各種介面(例如,SATA、PCI-E、LAN、GPIO、SPI或I2C等介面)的外掛或非外掛周邊裝置120。關於ARM可信賴韌體131及其所能夠提供的ATF服務,所屬技術領域具備通常知識者當可從ARM架構相關的習知技術中獲致足夠的教示,在此不再贅述。In one embodiment, the ARM processor 130 includes an ARM Trusted Firmware (ATF) 131 for providing ATF services. It is worth mentioning that the ARM Trusted Firmware 131 is running at an exception level that is not lower than the operating system. For example, the operating system of the ARM server 100 is, for example, running at a first exception level (eg, EL1), while the ARM trustworthy firmware 131 is running at a second exception level that is not lower than the first exception level (eg, , EL3). Thus, the ARM Trusted Firmware 131 can access all of the ARM processor 130 itself and the plug-in or non-plug-in peripherals 120 of various interfaces (eg, SATA, PCI-E, LAN, GPIO, SPI, or I2C interfaces). Regarding the ARM Trusted Firmware 131 and the ATF services that it can provide, those skilled in the art can obtain sufficient teachings from the prior art related to the ARM architecture, and details are not described herein.

特別是,當基板管理控制器110偵測到ARM處理器130本身或是有周邊元件120出現異常時,會將異常狀況通知ARM處理器130。由於ARM處理器130中的ARM可信賴韌體131是運行於不低於作業系統的例外層級,因此ARM可信賴韌體131能夠直接對ARM伺服器100中發生異常的元件進行處理或修復。In particular, when the substrate management controller 110 detects that the ARM processor 130 itself or the peripheral component 120 is abnormal, the ARM processor 130 is notified of the abnormal condition. Since the ARM trusted firmware 131 in the ARM processor 130 is running at an exception level not lower than the operating system, the ARM Trusted Firmware 131 can directly process or repair an abnormality in the ARM server 100.

圖2繪示本發明一實施例的ARM架構伺服器的管理方法的流程圖。圖2實施例中的管理方法適用於圖1中的ARM架構伺服器100,以下將參照圖1中ARM架構伺服器100的各組件來描述圖2實施例方法的詳細步驟。2 is a flow chart of a method for managing an ARM architecture server according to an embodiment of the present invention. The management method in the embodiment of FIG. 2 is applicable to the ARM architecture server 100 of FIG. 1, and the detailed steps of the method of the embodiment of FIG. 2 will be described below with reference to the components of the ARM architecture server 100 of FIG.

請參照圖2,在步驟S210中,基板管理控制器110監控並判斷至少一個周邊裝置120以及ARM處理器130是否發生異常。Referring to FIG. 2, in step S210, the substrate management controller 110 monitors and determines whether at least one peripheral device 120 and the ARM processor 130 have an abnormality.

舉例來說,基板管理控制器110可例如監控ARM處理器130中的溫度感測器以判斷ARM處理器130是否過熱;基板管理控制器110可例如監控ARM伺服器100中的記憶體裝置以判斷其是否正常運作;或基板管理控制器110可例如監控ARM伺服器100的PCI-E匯流排上的PCI-E裝置是否正常運作等,但不在此限。所屬領域具備通常知識者當可從基板管理控制器與習知的相關知識中獲致足夠的教示,以設定各元件的異常狀態並且完成上述判斷各元件是否發生異常的操作,故在此不再一一贅述。For example, the substrate management controller 110 can monitor, for example, a temperature sensor in the ARM processor 130 to determine whether the ARM processor 130 is overheated; the substrate management controller 110 can monitor, for example, the memory device in the ARM server 100 to determine Whether it is in normal operation; or the baseboard management controller 110 can, for example, monitor whether the PCI-E device on the PCI-E bus of the ARM server 100 is operating normally, etc., but not limited thereto. Those skilled in the art can obtain sufficient teachings from the substrate management controller and the related knowledge to set the abnormal state of each component and complete the above operation for judging whether each component has an abnormality, and therefore no longer A narrative.

若基板管理控制器110沒有發現異常時,則繼續執行步驟S210。反之,若基板管理控制器110判斷ARM處理器130或有其中一個周邊裝置120發生異常時,則進入步驟S220,基板管理控制器110會依據判斷結果來產生事件訊息,並在步驟S230中將其傳遞給ARM處理器130。If the substrate management controller 110 does not find an abnormality, the process proceeds to step S210. On the other hand, if the substrate management controller 110 determines that the ARM processor 130 has an abnormality in one of the peripheral devices 120, the process proceeds to step S220, and the substrate management controller 110 generates an event message according to the determination result, and then, in step S230, Passed to the ARM processor 130.

詳細來說,基板管理控制器110依據判斷結果所產生的事件訊息是對應於發生異常的ARM處理器130或周邊元件120。舉例來說,當基板管理控制器110判斷ARM處理器130過熱時,會產生指示ARM處理器130過熱的事件訊息;當基板管理控制器110判斷記憶體裝置不正常運作(例如,資料的錯誤位元過多而錯誤糾正碼機制無法校正)時,會產生指示記憶體裝置無法正常運作的事件訊息等,但本發明並不限於此。In detail, the event information generated by the substrate management controller 110 according to the determination result is the ARM processor 130 or the peripheral component 120 corresponding to the occurrence of the abnormality. For example, when the baseboard management controller 110 determines that the ARM processor 130 is overheated, an event message indicating that the ARM processor 130 is overheated is generated; when the baseboard management controller 110 determines that the memory device is not functioning properly (for example, the error bit of the data) When there are too many elements and the error correction code mechanism cannot be corrected, an event message or the like indicating that the memory device cannot operate normally is generated, but the present invention is not limited thereto.

在步驟S240中,ARM處理器130中的ARM可信賴韌體131會接收來自基板管理控制器110的事件訊息,並且對事件訊息所對應的ARM處理器130或周邊裝置120進行事件處理操作,以不中斷運行ARM架構伺服器100。In step S240, the ARM trustworthy firmware 131 in the ARM processor 130 receives the event message from the baseboard management controller 110, and performs an event processing operation on the ARM processor 130 or the peripheral device 120 corresponding to the event message. The ARM architecture server 100 is not interrupted.

在一實施例中,ARM處理器130連接於周邊裝置120(例如,溫度感測器),而事件訊息對應於ARM處理器130,例如是指示ARM處理器130發生過熱。在ARM可信賴韌體131接收到此事件訊息後,會對ARM處理器130執行事件處理操作。舉例而言,ARM可信賴韌體131例如會降低ARM處理器130的工作頻率,或調整ARM處理器130中CPU溫度調節器(throttle)的等級,來達到降溫的效果。如此一來,雖然ARM處理器130發生異常(過熱),仍然能夠藉由ARM可信賴韌體及時的進行處理,以避免更嚴重的毀損而導致ARM架構伺服器100中斷運行而停止服務。In one embodiment, the ARM processor 130 is coupled to the peripheral device 120 (eg, a temperature sensor) and the event message corresponds to the ARM processor 130, for example, to indicate that the ARM processor 130 has overheated. After the ARM trusted firmware 131 receives this event message, it performs an event processing operation on the ARM processor 130. For example, the ARM trustworthy firmware 131 may, for example, reduce the operating frequency of the ARM processor 130 or adjust the level of the CPU temperature regulator in the ARM processor 130 to achieve a cooling effect. In this way, although the ARM processor 130 is abnormal (overheated), it can still be processed in time by the ARM trusted firmware to avoid more serious damage and cause the ARM architecture server 100 to stop running and stop the service.

在一實施例中,ARM架構伺服器100中包括周邊裝置120(例如,雙通道雙倍資料率同步動態隨機存取記憶體),而事件訊息對應於雙通道雙倍資料率同步動態隨機存取記憶體的其中一個記憶體通道,例如是指示該記憶體通道的記憶體無法正常運作。在ARM可信賴韌體131接收到此事件訊息後,會對事件訊息所對應的該記憶體通道130執行事件處理操作。舉例而言,ARM可信賴韌體131例如會關閉事件訊息所對應的記憶體通道,而保留另外一條記憶體通道的記憶體能夠正常運作。如此一來,ARM架構伺服器100依然能夠不中斷服務的繼續運行。In an embodiment, the ARM architecture server 100 includes a peripheral device 120 (eg, dual channel double data rate synchronous dynamic random access memory), and the event message corresponds to dual channel double data rate synchronous dynamic random access. One of the memory channels of the memory, for example, indicates that the memory of the memory channel is not functioning properly. After the ARM trusted firmware 131 receives the event message, an event processing operation is performed on the memory channel 130 corresponding to the event message. For example, the ARM trustworthy firmware 131, for example, closes the memory channel corresponding to the event message, while the memory that retains another memory channel can function normally. As a result, the ARM architecture server 100 can continue to operate without interrupting the service.

在一實施例中,ARM架構伺服器100中包括周邊裝置120(例如,PCI-E裝置),而事件訊息對應於其中一個PCI-E裝置或終端(例如,PCI-E乙太網路卡),例如是指示該PCI-E裝置無法正常運作。在ARM可信賴韌體131接收到此事件訊息後,會對PCI-E裝置執行事件處理操作。舉例而言,ARM可信賴韌體131例如會執行PCI-E重置(PCI-E reset)的操作,以嘗試修復PCI-E裝置。如此一來,能夠不重啟ARM架構伺服器100而進行PCI-E重置的操作,以修復PCI-E裝置使其恢復正常運作。所屬技術領域具備通常知識者當可從PCI-E重置的相關習知技術中獲致足夠的教示,以完成本實施例所述之PCI-E重置操作,故在此並不贅述。In an embodiment, the ARM architecture server 100 includes a peripheral device 120 (eg, a PCI-E device), and the event message corresponds to one of the PCI-E devices or terminals (eg, a PCI-E Ethernet card). For example, it indicates that the PCI-E device is not functioning properly. After the ARM trusted firmware 131 receives this event message, it performs an event processing operation on the PCI-E device. For example, the ARM Trusted Firmware 131, for example, performs a PCI-E reset operation to attempt to repair a PCI-E device. In this way, the PCI-E reset operation can be performed without restarting the ARM architecture server 100 to repair the PCI-E device to resume normal operation. A person skilled in the art can obtain sufficient teachings from the related art of PCI-E reset to complete the PCI-E reset operation described in this embodiment, and thus is not described herein.

綜上所述,本發明實施例所提出的ARM架構伺服器及其管理方法,利用ARM處理器中的ATF韌體來直接對ARM架構伺服器中發生異常的元件進行處理或修復,能夠避免伺服器產生嚴重的毀損與資料的遺失。另一方面,使用者無須在作業系統中安裝額外的監控程式,可以免去監控程式中暗藏後門程式導致機密資料外流的風險,對於伺服器的服務而言亦提升了安全性。In summary, the ARM architecture server and the management method thereof according to the embodiments of the present invention use the ATF firmware in the ARM processor to directly process or repair an abnormal component in the ARM architecture server, thereby avoiding the servo. The device caused serious damage and loss of data. On the other hand, the user does not need to install an additional monitoring program in the operating system, which eliminates the risk of the hidden program in the monitoring program causing the outflow of confidential data, and also improves the security of the server service.

雖然本發明已以實施例揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明的精神和範圍內,當可作些許的更動與潤飾,故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention, and any one of ordinary skill in the art can make some changes and refinements without departing from the spirit and scope of the present invention. The scope of the invention is defined by the scope of the appended claims.

100‧‧‧ARM架構伺服器100‧‧‧ARM architecture server

110‧‧‧基板管理控制器110‧‧‧Baseboard Management Controller

120‧‧‧至少一個周邊裝置120‧‧‧At least one peripheral device

130‧‧‧ARM處理器130‧‧‧ARM processor

131‧‧‧ARM可信賴韌體131‧‧‧ARM Trustworthy Firmware

S210~S240‧‧‧ARM架構伺服器的管理方法的步驟Steps for managing the S210~S240‧‧‧ARM architecture server

圖1繪示本發明一實施例的ARM架構伺服器的概要方塊圖。 圖2繪示本發明一實施例的ARM架構伺服器的管理方法的流程圖。FIG. 1 is a schematic block diagram of an ARM architecture server according to an embodiment of the invention. 2 is a flow chart of a method for managing an ARM architecture server according to an embodiment of the present invention.

Claims (10)

一種ARM架構伺服器,包括: 至少一周邊裝置; 一基板管理控制器,耦接於該至少一周邊裝置;以及 一ARM處理器,耦接於該至少一周邊裝置以及該基板管理控制器,其中該ARM處理器包括一ARM可信賴韌體, 其中該基板管理控制器用以監控並判斷該至少一周邊裝置以及該ARM處理器是否發生異常,並依據一判斷結果產生一事件訊息, 其中該ARM可信賴韌體用以接收來自該基板管理控制器的該事件訊息,其中該事件訊息對應於該ARM處理器或該至少一周邊裝置的其中之一, 其中該ARM可信賴韌體更用以對該事件訊息所對應的該ARM處理器或該周邊裝置執行一事件處理操作。An ARM architecture server, comprising: at least one peripheral device; a substrate management controller coupled to the at least one peripheral device; and an ARM processor coupled to the at least one peripheral device and the substrate management controller, wherein The ARM processor includes an ARM trusted firmware, wherein the baseboard management controller is configured to monitor and determine whether the at least one peripheral device and the ARM processor are abnormal, and generate an event message according to a determination result, wherein the ARM may The trusted firmware is configured to receive the event message from the baseboard management controller, wherein the event message corresponds to one of the ARM processor or the at least one peripheral device, wherein the ARM trustworthy firmware is further used to The ARM processor or the peripheral device corresponding to the event message performs an event processing operation. 如申請專利範圍第1項所述的ARM架構伺服器,其中該事件訊息對應於該ARM處理器,其中該事件處理操作包括調整該ARM處理器的工作頻率。The ARM architecture server of claim 1, wherein the event message corresponds to the ARM processor, wherein the event processing operation comprises adjusting an operating frequency of the ARM processor. 如申請專利範圍第1項所述的ARM架構伺服器,其中該至少一周邊裝置包括一記憶體裝置,該記憶體裝置包括至少兩個記憶體通道, 其中該事件訊息對應於該些記憶體通道的其中之一,並且該事件處理操作包括關閉該事件訊息所對應的該記憶體通道。The ARM architecture server of claim 1, wherein the at least one peripheral device comprises a memory device, the memory device comprising at least two memory channels, wherein the event message corresponds to the memory channels One of the event processing operations includes closing the memory channel corresponding to the event message. 如申請專利範圍第1項所述的ARM架構伺服器,其中該至少一周邊裝置包括一PCI-E裝置,其中該事件訊息對應於該PCI-E裝置,並且該事件處理操作包括執行一PCI-E重置。The ARM architecture server of claim 1, wherein the at least one peripheral device comprises a PCI-E device, wherein the event message corresponds to the PCI-E device, and the event processing operation comprises performing a PCI- E reset. 如申請專利範圍第1項所述的ARM架構伺服器,其中該ARM架構伺服器包括多個例外層級,其中該ARM架構伺服器的一作業系統運行於一第一例外層級,並且該ARM可信賴韌體運行於不低於該第一例外層級的一第二例外層級。The ARM architecture server of claim 1, wherein the ARM architecture server includes a plurality of exception levels, wherein an operating system of the ARM architecture server runs at a first exception level, and the ARM is trustworthy. The firmware runs at a second exception level that is not lower than the first exception level. 一種ARM架構伺服器的管理方法,其中該ARM架構伺服器包括至少一周邊裝置、一基板管理控制器以及一ARM處理器,該管理方法包括: 該基板管理控制器監控並判斷該至少一周邊裝置以及該ARM處理器是否發生異常; 該基板管理控制器依據一判斷結果產生一事件訊息,其中該事件訊息對應於該ARM處理器或該至少一周邊裝置的其中之一; 該基板管理控制器傳送該事件訊息至該ARM處理器;以及 藉由該ARM處理器中的一ARM可信賴韌體,對該事件訊息所對應的該ARM處理器或該周邊裝置執行一事件處理操作。An ARM architecture server management method, wherein the ARM architecture server includes at least one peripheral device, a substrate management controller, and an ARM processor, the management method includes: the substrate management controller monitors and determines the at least one peripheral device And the ARM management controller generates an event message according to a determination result, wherein the event message corresponds to one of the ARM processor or the at least one peripheral device; the baseboard management controller transmits The event message is sent to the ARM processor; and an ARM processing operation is performed on the ARM processor or the peripheral device corresponding to the event message by an ARM trusted firmware in the ARM processor. 如申請專利範圍第6項所述的管理方法,其中該事件訊息對應於該ARM處理器,其中該事件處理操作包括: 調整該ARM處理器的工作頻率。The management method of claim 6, wherein the event message corresponds to the ARM processor, wherein the event processing operation comprises: adjusting an operating frequency of the ARM processor. 如申請專利範圍第6項所述的管理方法,其中該至少一周邊裝置包括一記憶體裝置,該記憶體裝置包括至少兩個記憶體通道,並且該事件訊息對應於該些記憶體通道的其中之一,其中該事件處理操作包括: 關閉該事件訊息所對應的該記憶體通道。The management method of claim 6, wherein the at least one peripheral device comprises a memory device, the memory device includes at least two memory channels, and the event message corresponds to the memory channels. For example, the event processing operation includes: closing the memory channel corresponding to the event message. 如申請專利範圍第6項所述的管理方法,其中該至少一周邊裝置包括一PCI-E裝置,其中該事件訊息對應於該PCI-E裝置,並且該事件處理操作包括: 執行一PCI-E重置。The management method of claim 6, wherein the at least one peripheral device comprises a PCI-E device, wherein the event message corresponds to the PCI-E device, and the event processing operation comprises: performing a PCI-E Reset. 如申請專利範圍第6項所述的管理方法,其中該ARM架構伺服器包括多個例外層級,其中該ARM架構伺服器的一作業系統運行於一第一例外層級,並且該ARM可信賴韌體運行於不低於該第一例外層級的一第二例外層級。The management method of claim 6, wherein the ARM architecture server includes a plurality of exception levels, wherein an operating system of the ARM architecture server runs at a first exception level, and the ARM trusted firmware Running at a second exception level that is not lower than the first exception level.
TW106130901A 2017-09-11 2017-09-11 Arm-based server and managenent method thereof TWI635401B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW106130901A TWI635401B (en) 2017-09-11 2017-09-11 Arm-based server and managenent method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW106130901A TWI635401B (en) 2017-09-11 2017-09-11 Arm-based server and managenent method thereof

Publications (2)

Publication Number Publication Date
TWI635401B TWI635401B (en) 2018-09-11
TW201913407A true TW201913407A (en) 2019-04-01

Family

ID=64452768

Family Applications (1)

Application Number Title Priority Date Filing Date
TW106130901A TWI635401B (en) 2017-09-11 2017-09-11 Arm-based server and managenent method thereof

Country Status (1)

Country Link
TW (1) TWI635401B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI697766B (en) * 2018-12-10 2020-07-01 神雲科技股份有限公司 Electronic device and reset method thereof
CN111414272B (en) * 2019-01-04 2023-08-08 佛山市顺德区顺达电脑厂有限公司 Electronic device and reset method thereof

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101666032B1 (en) * 2012-05-31 2016-10-14 한국전자통신연구원 Method and apparatus for supporting virtualization of loadable module
TW201417536A (en) * 2012-10-24 2014-05-01 Hon Hai Prec Ind Co Ltd Method and system for automatically managing servers
KR20140144520A (en) * 2013-06-11 2014-12-19 삼성전자주식회사 Processor module, server system and method for controlling processor module
TW201523239A (en) * 2013-12-06 2015-06-16 Hon Hai Prec Ind Co Ltd System and method for detecting working status of fans and fan controller
US9721660B2 (en) * 2014-10-24 2017-08-01 Microsoft Technology Licensing, Llc Configurable volatile memory without a dedicated power source for detecting a data save trigger condition
CN105607716A (en) * 2016-01-12 2016-05-25 浪潮(北京)电子信息产业有限公司 Server and server heat dissipation system and monitoring method thereof

Also Published As

Publication number Publication date
TWI635401B (en) 2018-09-11

Similar Documents

Publication Publication Date Title
US9971609B2 (en) Thermal watchdog process in host computer management and monitoring
US7085945B2 (en) Using multiple thermal points to enable component level power and thermal management
JP5469254B2 (en) Modified mechanism for detecting no processor swap condition and fast bus calibration during boot
US11556490B2 (en) Baseboard management controller-based security operations for hot plug capable devices
US20100228960A1 (en) Virtual memory over baseboard management controller
US8230237B2 (en) Pre-boot environment power management
US9021317B2 (en) Reporting and processing computer operation failure alerts
US11132314B2 (en) System and method to reduce host interrupts for non-critical errors
TWI635401B (en) Arm-based server and managenent method thereof
TW201417536A (en) Method and system for automatically managing servers
US11593487B2 (en) Custom baseboard management controller (BMC) firmware stack monitoring system and method
US8230446B2 (en) Providing a computing system with real-time capabilities
US20140379162A1 (en) Server system and monitoring method
US7017062B2 (en) Method and apparatus for recovering from an overheated microprocessor
JP6800935B2 (en) How to control a fan in an electronic system
JP5689783B2 (en) Computer, computer system, and failure information management method
US20150100817A1 (en) Anticipatory Protection Of Critical Jobs In A Computing System
CN109491813B (en) ARM architecture server and management method thereof
US11797679B2 (en) Trust verification system and method for a baseboard management controller (BMC)
US11714696B2 (en) Custom baseboard management controller (BMC) firmware stack watchdog system and method
US20230009470A1 (en) Workspace-based fixed pass-through monitoring system and method for hardware devices using a baseboard management controller (bmc)
US20070157037A1 (en) Device throttling system and method
US8543755B2 (en) Mitigation of embedded controller starvation in real-time shared SPI flash architecture
JP2016091397A (en) Computer device, and management method thereof
US11593490B2 (en) System and method for maintaining trusted execution in an untrusted computing environment using a secure communication channel