TW202236274A - Smart network interface controller system and method of detecting error - Google Patents

Smart network interface controller system and method of detecting error Download PDF

Info

Publication number
TW202236274A
TW202236274A TW110108947A TW110108947A TW202236274A TW 202236274 A TW202236274 A TW 202236274A TW 110108947 A TW110108947 A TW 110108947A TW 110108947 A TW110108947 A TW 110108947A TW 202236274 A TW202236274 A TW 202236274A
Authority
TW
Taiwan
Prior art keywords
error information
error
module
parity
detection module
Prior art date
Application number
TW110108947A
Other languages
Chinese (zh)
Other versions
TWI738627B (en
Inventor
劉葉
Original Assignee
英業達股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 英業達股份有限公司 filed Critical 英業達股份有限公司
Priority to TW110108947A priority Critical patent/TWI738627B/en
Application granted granted Critical
Publication of TWI738627B publication Critical patent/TWI738627B/en
Publication of TW202236274A publication Critical patent/TW202236274A/en

Links

Images

Abstract

A method of detecting method is disclosed in the present invention. In the method, a SoC(System on a Chip) is utilized to detect a parity error, a system error and a multi bit ECC memory error; then a CPLD(Complex Programmable Logic Device) is utilized to save and transmit the above error to a BMC(Baseboard Management Controller); and then a smart network interface controller system is operated to re-boot; finally, the smart network interface controller system starts working when the SoC does not to detect any error. The smart network interface controller system is also disclosed in the present invention.

Description

具有錯誤偵測功能的智能網卡系統及錯誤偵測方法Intelligent network card system with error detection function and error detection method

本發明係有關於一種系統及方法,尤其是指一種具有錯誤偵測功能的智能網卡系統及錯誤偵測方法。The present invention relates to a system and method, in particular to an intelligent network card system with error detection function and an error detection method.

網卡是接入網路進行通信的必備設備,並且與主板上的中央處理單元(Central Processing Unit;CPU)配合而完成整個網路協議中各層的處理。隨著科技的發展,智慧網卡也逐漸普及。一般來說,相較於傳統網卡,智慧網卡除了可以收發資料外,同時還具備高性能及可編成的運算能力。然而智慧網卡上的系統晶片若發生奇偶校驗錯誤(Parity Error;PERR)、系統錯誤(System Error;SERR)或修正記憶體錯誤(Multi Bit ECC Memory Error),則會造成系統晶片故障,導致智慧網卡無法正常運作。因此,先前技術存在改善的空間。The network card is an essential device for accessing the network for communication, and cooperates with the Central Processing Unit (CPU) on the motherboard to complete the processing of each layer in the entire network protocol. With the development of science and technology, smart network cards are becoming more and more popular. Generally speaking, compared with traditional network cards, smart network cards not only can send and receive data, but also have high-performance and programmable computing capabilities. However, if the system chip on the smart network card has a parity error (Parity Error; PERR), a system error (System Error; SERR) or a correction memory error (Multi Bit ECC Memory Error), it will cause the system chip to fail, resulting in smart The network card is not functioning properly. Therefore, there is room for improvement in the prior art.

有鑒於在先前技術中,奇偶校驗錯誤、系統錯誤或修正記憶體錯誤所造成智慧網卡無法正常運作所衍生出的種種問題。本發明之一主要目的係提供一種具有錯誤偵測功能的智能網卡系統,用以解決先前技術中的至少一個問題。In view of the various problems derived from the inability of the smart network card to operate normally due to parity error, system error or memory error correction in the prior art. One of the main objectives of the present invention is to provide an iNIC system with error detection function to solve at least one problem in the prior art.

本發明為解決先前技術之問題,所採用之必要技術手段為提供一種具有錯誤偵測功能的智能網卡系統,係外接於一主機系統,包含一處理晶片與一複雜可程式化邏輯裝置。處理晶片包含一奇偶校驗偵測模組、一系統錯誤偵測模組、一修正記憶體偵測模組與一第一通訊模組。奇偶校驗偵測模組用以偵測處理晶片之一奇偶校驗錯誤資訊。系統錯誤偵測模組用以偵測處理晶片之一系統錯誤資訊。修正記憶體偵測模組用以偵測處理晶片之一修正記憶體錯誤資訊。第一通訊模組電性連接奇偶校驗偵測模組、系統錯誤偵測模組與修正記憶體偵測模組,用以在偵測出奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體偵測模組中之至少一者時,傳送出上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體偵測模組中之至少一者。In order to solve the problems of the prior art, the necessary technical means adopted by the present invention is to provide an intelligent network card system with error detection function, which is externally connected to a host system and includes a processing chip and a complex programmable logic device. The processing chip includes a parity detection module, a system error detection module, a correction memory detection module and a first communication module. The parity check detection module is used for detecting one parity check error information of the processing chip. The system error detection module is used for detecting system error information of the processing chip. The memory correction detection module is used for detecting memory error information of one of the processing chips. The first communication module is electrically connected to the parity detection module, the system error detection module and the correction memory detection module for detecting parity error information, system error information and correction memory When at least one of the detection modules is used, at least one of the above-mentioned parity error information, system error information and memory correction detection module is sent.

複雜可程式化邏輯裝置電性連接處理晶片,並包含一接收模組、一錯誤儲存模組與一第二通訊模組。接收模組電性連接第一通訊模組,用以接收上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體偵測模組中之至少一者。錯誤儲存模組電性連接接收模組,並包含一奇偶校驗儲存單元、一系統錯誤儲存單元與一修正記憶體儲存單元。奇偶校驗儲存單元用以儲存奇偶校驗錯誤資訊。系統錯誤儲存單元用以儲存系統錯誤資訊。修正記憶體儲存單元用以儲存修正記憶體錯誤資訊。第二通訊模組電性連接錯誤儲存模組,用以將上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體偵測模組中之至少一者傳送至主機系統之一基板管理控制器。The complex programmable logic device is electrically connected to the processing chip, and includes a receiving module, an error storage module and a second communication module. The receiving module is electrically connected to the first communication module for receiving at least one of the above-mentioned parity error information, system error information and memory correction detection module. The error storage module is electrically connected to the receiving module, and includes a parity storage unit, a system error storage unit and a correction memory storage unit. The parity storage unit is used for storing parity error information. The system error storage unit is used for storing system error information. The correction memory storage unit is used for storing correction memory error information. The second communication module is electrically connected to the error storage module for transmitting at least one of the above-mentioned parity error information, system error information and the correction memory detection module to a baseboard management controller of the host system.

其中,基板管理控制器接收並儲存上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體偵測模組中之至少一者後,係傳送一清除信號,使智能網卡系統受操作地單獨重新啟動之後,錯誤儲存模組清除上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體偵測模組中之至少一者,藉以使智能網卡系統正常運作。Wherein, after the baseboard management controller receives and stores at least one of the above-mentioned parity error information, system error information, and the correction memory detection module, it sends a clear signal, so that the smart network card system can be independently restarted. Afterwards, the error storage module clears at least one of the above-mentioned parity error information, system error information and memory detection module, so as to make the smart network card system operate normally.

在上述必要技術手段的基礎下,本發明所衍生之一附屬技術手段為使具有錯誤偵測功能的智能網卡系統中之第一通訊模組,係利用一串列通用型輸入輸出(Serial General Purpose Input/Output;SGPIO)介面電性連接接收模組。On the basis of the above-mentioned necessary technical means, an auxiliary technical means derived from the present invention is to make the first communication module in the intelligent network card system with error detection function use a series of general-purpose input and output (Serial General Purpose) Input/Output; SGPIO) interface is electrically connected to the receiving module.

在上述必要技術手段的基礎下,本發明所衍生之一附屬技術手段為使具有錯誤偵測功能的智能網卡系統中之第二通訊模組,係一I2C模組。On the basis of the above-mentioned necessary technical means, an auxiliary technical means derived from the present invention is to make the second communication module in the intelligent network card system with error detection function be an I2C module.

在上述必要技術手段的基礎下,本發明所衍生之一附屬技術手段為使具有錯誤偵測功能的智能網卡系統中之第二通訊模組,係利用一電源管理匯流排(Power Management Bus;PMBUS)電性連接主機系統之基板管理控制器。On the basis of the above-mentioned necessary technical means, an auxiliary technical means derived from the present invention is to make the second communication module in the intelligent network card system with error detection function use a power management bus (Power Management Bus; PMBUS ) is electrically connected to the baseboard management controller of the host system.

本發明為解決先前技術之問題,所採用之必要技術手段為另外提供一種錯誤偵測方法,係利用如上述之具有錯誤偵測功能的智能網卡系統加以實施,並包含以下步驟:利用奇偶校驗偵測模組、系統錯誤偵測模組與修正記憶體偵測模組偵測出上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊中之至少一者;利用第一通訊模組傳送出上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊中之至少一者;利用接收模組接收上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊中之至少一者;利用錯誤儲存模組儲存上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊中之至少一者;利用第二通訊模組將上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊中之至少一者傳送至基板管理控制器;使智能網卡系統單獨重新啟動;利用錯誤儲存模組接收清除信號,並清除上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體偵測模組中之至少一者。In order to solve the problems of the prior art, the necessary technical means adopted by the present invention is to provide an additional error detection method, which is implemented by using the intelligent network card system with error detection function as described above, and includes the following steps: using parity check The detection module, the system error detection module and the correction memory detection module detect at least one of the above-mentioned parity error information, system error information and correction memory error information; using the first communication module Send out at least one of the above-mentioned parity error information, system error information, and corrected memory error information; use the receiving module to receive at least one of the above-mentioned parity error information, system error information, and corrected memory error information ; Use the error storage module to store at least one of the above-mentioned parity error information, system error information and correction memory error information; use the second communication module to store the above-mentioned parity error information, system error information and correction memory At least one of the error information is sent to the baseboard management controller; the smart network card system is restarted independently; the error storage module is used to receive the clear signal, and the above-mentioned parity error information, system error information and correction memory detection module are cleared at least one of the group.

承上所述,本發明所提供之具有錯誤偵測功能的智能網卡系統及錯誤偵測方法,利用奇偶校驗偵測模組、系統錯誤偵測模組、修正記憶體偵測模組分別偵測處理晶片的奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊,利用奇偶校驗儲存單元、系統錯誤儲存單元與修正記憶體儲存單元分別儲存奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊,相較於先前技術,本發明可以正常並穩定的記錄會導致處理晶片故障無法正常運作的奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊,並可以將上述錯誤資訊傳送至主機系統的基板管理控制器,且可以相對於主機系統單獨重新啟動,使錯誤儲存模組清除上述錯誤資訊,直到奇偶校驗偵測模組、系統錯誤偵測模組與修正記憶體偵測模組沒有偵測到上述錯誤資訊後,具有錯誤偵測功能的智能網卡系統便能正常運作。Based on the above, the intelligent network card system and error detection method with error detection function provided by the present invention use the parity detection module, the system error detection module, and the corrected memory detection module to respectively detect The parity check error information, system error information and correction memory error information of the processing chip are measured, and the parity check storage unit, the system error storage unit and the correction memory storage unit are used to respectively store the parity check error information, system error information and correction Memory error information, compared with the prior art, the present invention can normally and stably record the parity check error information, system error information and correct memory error information that will cause the processing chip failure to work normally, and can store the above error information sent to the baseboard management controller of the host system, and can be restarted independently relative to the host system, so that the error storage module clears the above error information until the parity check detection module, system error detection module and correction memory detection module After the test module fails to detect the above error information, the intelligent network card system with error detection function can operate normally.

下面將結合示意圖對本發明的具體實施方式進行更詳細的描述。根據下列描述和申請專利範圍,本發明的優點和特徵將更清楚。需說明的是,圖式均採用非常簡化的形式且均使用非精準的比例,僅用以方便、明晰地輔助說明本發明實施例的目的。The specific implementation manner of the present invention will be described in more detail below with reference to schematic diagrams. The advantages and features of the present invention will be more clear from the following description and claims. It should be noted that all the drawings are in very simplified form and use imprecise scales, which are only used to facilitate and clearly assist the purpose of illustrating the embodiments of the present invention.

請參閱第一圖與第二圖,其中,第一圖係顯示本發明較佳實施例所提供之具有錯誤偵測功能的智能網卡系統之方塊圖;以及,第二圖係顯示本發明較佳實施例所提供之錯誤偵測方法之流程圖。如圖所示,一種錯誤偵測方法係利用一種具有錯誤偵測功能的智能網卡系統1加以實施,並包含步驟S101至步驟S108。Please refer to the first figure and the second figure, wherein, the first figure shows the block diagram of the intelligent network card system with error detection function provided by the preferred embodiment of the present invention; and, the second figure shows the preferred embodiment of the present invention A flow chart of the error detection method provided by the embodiment. As shown in the figure, an error detection method is implemented by using an iNIC system 1 with an error detection function, and includes steps S101 to S108.

具有錯誤偵測功能的智能網卡系統1外接於一主機系統2,並包含一處理晶片11與一複雜可程式化邏輯裝置(Complex Programmable Logic Device;CPLD)12。The smart network card system 1 with error detection function is externally connected to a host system 2 and includes a processing chip 11 and a complex programmable logic device (Complex Programmable Logic Device, CPLD) 12 .

處理晶片11包含一奇偶校驗偵測模組111、一系統錯誤偵測模組112、一修正記憶體偵測模組113與一第一通訊模組114。其中,處理晶片11係一系統單晶片(System on a Chip;SoC)。The processing chip 11 includes a parity detection module 111 , a system error detection module 112 , a modified memory detection module 113 and a first communication module 114 . Wherein, the processing chip 11 is a System on a Chip (SoC).

複雜可程式化邏輯裝置12電性連接處理晶片11,並包含一接收模組121、一錯誤儲存模組122與一第二通訊模組123。錯誤儲存模組122包含一奇偶校驗儲存單元1221、一系統錯誤儲存單元1222與一修正記憶體儲存單元1223。The CPLD 12 is electrically connected to the processing chip 11 and includes a receiving module 121 , an error storage module 122 and a second communication module 123 . The error storage module 122 includes a parity storage unit 1221 , a system error storage unit 1222 and a correction memory storage unit 1223 .

步驟S101:奇偶校驗偵測模組、系統錯誤偵測模組與修正記憶體偵測模組是否偵測出奇偶校驗錯誤資訊、系統錯誤資訊或修正記憶體錯誤資訊。Step S101 : whether the parity check detection module, the system error detection module and the correction memory detection module detect parity error information, system error information or correction memory error information.

奇偶校驗偵測模組111用以偵測處理晶片11的一奇偶校驗錯誤(Parity Error;PERR)資訊。系統錯誤偵測模組112用以偵測處理晶片11的一系統錯誤(System Error;SERR)資訊。修正記憶體偵測模組113用以偵測處理晶片11的一修正記憶體錯誤(Multi Bit ECC Memory Error)資訊。上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊可以統稱為錯誤資訊,若處理晶片11產生上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊的任何一者,則會導致處理晶片11故障無法正常運作。實際操作上,處理晶片11也可能產生上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊的至少二者。The parity detection module 111 is used for detecting a Parity Error (PERR) information of the processing chip 11 . The system error detection module 112 is used for detecting a system error (System Error; SERR) information of the processing chip 11 . The modified memory detection module 113 is used for detecting a modified memory error (Multi Bit ECC Memory Error) information of the processing chip 11 . The above-mentioned parity error information, system error information and corrected memory error information can be collectively referred to as error information. If any of the above-mentioned parity error information, system error information and corrected memory error information is generated by the processing chip 11, As a result, the processing chip 11 fails to operate normally. In actual operation, the processing chip 11 may also generate at least two of the above-mentioned parity error information, system error information and correct memory error information.

因此,奇偶校驗偵測模組111、系統錯誤偵測模組112與修正記憶體偵測模組113會偵測處理晶片11是否產生上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊的任何一者。在奇偶校驗偵測模組111、系統錯誤偵測模組112與修正記憶體偵測模組113偵測出上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊的至少一者時,則會進入步驟S102。Therefore, the parity detection module 111, the system error detection module 112 and the correction memory detection module 113 will detect whether the processing chip 11 generates the above-mentioned parity error information, system error information and correction memory error any of the information. When the parity detection module 111, the system error detection module 112 and the correction memory detection module 113 detect at least one of the above-mentioned parity error information, system error information and correction memory error information , it will enter step S102.

步驟S102:利用第一通訊模組傳送出上述錯誤資訊。Step S102: Utilize the first communication module to send the above error information.

第一通訊模組114電性連接奇偶校驗偵測模組111、系統錯誤偵測模組112與修正記憶體偵測模組113,用以接收並傳送出上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊的至少一者。The first communication module 114 is electrically connected to the parity detection module 111, the system error detection module 112 and the correction memory detection module 113, and is used to receive and transmit the above-mentioned parity error information and system error information. at least one of information and memory error correction information.

舉例說明,當只有奇偶校驗偵測模組111偵測出奇偶校驗錯誤資訊,而系統錯誤偵測模組112與修正記憶體偵測模組113沒有偵測出系統錯誤資訊與修正記憶體錯誤資訊時,第一通訊模組114僅會傳送出奇偶校驗錯誤資訊。當奇偶校驗偵測模組111與系統錯誤偵測模組112分別偵測出奇偶校驗錯誤資訊與系統錯誤資訊,而修正記憶體偵測模組113沒有偵測出修正記憶體錯誤資訊時,第一通訊模組114便會傳送出奇偶校驗錯誤資訊與系統錯誤資訊。For example, when only the parity check detection module 111 detects the parity check error information, but the system error detection module 112 and the correction memory detection module 113 do not detect the system error information and the correction memory In case of error information, the first communication module 114 will only send out parity error information. When the parity detection module 111 and the system error detection module 112 detect the parity error information and the system error information respectively, but the correction memory detection module 113 does not detect the correction memory error information , the first communication module 114 will send parity error information and system error information.

步驟S103:利用接收模組接收上述錯誤資訊。Step S103: Use the receiving module to receive the above error information.

複雜可程式化邏輯裝置12的接收模組121電性連接第一通訊模組114,用以接收上述錯誤資訊。如上所述,第一通訊模組114傳送出哪種錯誤資訊,接收模組121就會接收到哪種錯誤資訊。實際操作上,第一通訊模組114可利用一串列通用型輸入輸出(Serial General Purpose Input/Output;SGPIO)介面電性連接接收模組121,但不以此為限。The receiving module 121 of the CPLD 12 is electrically connected to the first communication module 114 for receiving the above error information. As mentioned above, which kind of error information is transmitted by the first communication module 114, the receiving module 121 will receive that kind of error information. In actual operation, the first communication module 114 can be electrically connected to the receiving module 121 through a serial general purpose input/output (Serial General Purpose Input/Output; SGPIO) interface, but not limited thereto.

步驟S104:利用錯誤儲存模組儲存上述錯誤資訊。Step S104: Use the error storage module to store the above error information.

複雜可程式化邏輯裝置12的錯誤儲存模組122電性連接接收模組121,用以儲存上述錯誤資訊,並包含一奇偶校驗儲存單元1221、一系統錯誤儲存單元1222與一修正記憶體儲存單元1223。如上所述,接收模組121接收到哪種錯誤資訊,錯誤儲存模組122就會儲存哪種錯誤資訊。而奇偶校驗儲存單元1221是用以儲存奇偶校驗錯誤資訊,系統錯誤儲存單元1222用以儲存系統錯誤資訊,修正記憶體儲存單元1223則是用以儲存修正記憶體錯誤資訊。The error storage module 122 of the complex programmable logic device 12 is electrically connected to the receiving module 121 for storing the above error information, and includes a parity storage unit 1221, a system error storage unit 1222 and a correction memory storage Unit 1223. As mentioned above, the error storage module 122 will store the error information that is received by the receiving module 121 . The parity storage unit 1221 is used to store parity error information, the system error storage unit 1222 is used to store system error information, and the correction memory storage unit 1223 is used to store correction memory error information.

步驟S105:利用第二通訊模組將上述錯誤資訊傳送至基板管理控制器。Step S105: Utilize the second communication module to transmit the above error information to the baseboard management controller.

複雜可程式化邏輯裝置12的第二通訊模組123電性連接錯誤儲存模組122,用以將上述錯誤資訊傳送至主機系統2的一基板管理控制器(Baseboard Management Controller;BMC)21。如上所述,錯誤儲存模組122儲存哪種錯誤資訊,第二通訊模組123便會傳送哪種錯誤資訊。The second communication module 123 of the CPLD 12 is electrically connected to the error storage module 122 for transmitting the above error information to a Baseboard Management Controller (BMC) 21 of the host system 2 . As mentioned above, which kind of error information is stored by the error storage module 122, that kind of error information will be transmitted by the second communication module 123.

步驟S106:利用基板管理控制器儲存上述錯誤資訊,並傳送清除信號。Step S106: Utilize the baseboard management controller to store the above error information, and send a clear signal.

基板管理控制器21會儲存上述錯誤資訊,並且得知具有錯誤偵測功能的智能網卡系統1處於一異常狀態,故會產生並傳送一清除信號。基板管理控制器21儲存上述錯誤資訊也可供一使用者自主機系統2觀察並得知具有錯誤偵測功能的智能網卡系統1的狀態以及導致異常狀態的錯誤資訊種類。The baseboard management controller 21 will store the above error information and know that the iNIC system 1 with error detection function is in an abnormal state, so it will generate and send a clear signal. The storage of the above error information by the BMC 21 can also allow a user to observe from the host system 2 and know the status of the iNIC system 1 with the error detection function and the type of error information causing the abnormal state.

步驟S107:使智能網卡系統單獨重新啟動。Step S107: Restart the iNIC system independently.

因為具有錯誤偵測功能的智能網卡系統1係外接於主機系統2,因此,具有錯誤偵測功能的智能網卡系統1係受操作地相較於主機系統2而單獨重新啟動。需說明的是,具有錯誤偵測功能的智能網卡系統1單獨重新啟動,並不會影響到主機系統2,因此主機系統2仍然可以正常運作。Since the iNIC system 1 with the error detection function is externally connected to the host system 2 , the iNIC system 1 with the error detection function is operable to reboot independently of the host system 2 . It should be noted that the independent restart of the iNIC system 1 with the error detection function will not affect the host system 2, so the host system 2 can still operate normally.

步驟S108:利用錯誤儲存模組接收清除信號,並清除上述錯誤資訊。Step S108: Use the error storage module to receive a clear signal, and clear the above error information.

複雜可程式化邏輯裝置12的錯誤儲存模組122接收清除信號後,便會清除上述錯誤資訊。After receiving the clear signal, the error storage module 122 of the CPLD 12 clears the above error information.

上述步驟S108會接回步驟S101,此時,會再利用奇偶校驗偵測模組111、系統錯誤偵測模組112與修正記憶體偵測模組113偵測處理晶片11是否產生上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊的至少一者。若結果仍為是,則會再次進行上述步驟S102至S108。The above step S108 will return to step S101. At this time, the parity detection module 111, the system error detection module 112 and the correction memory detection module 113 will be used to detect whether the processing chip 11 generates the above parity. At least one of verification error information, system error information and memory correction error information. If the result is still yes, the above steps S102 to S108 will be performed again.

當奇偶校驗偵測模組111、系統錯誤偵測模組112與修正記憶體偵測模組113都未偵測處理晶片11產生上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊的任何一者時,表示具有錯誤偵測功能的智能網卡系統1可以正常運作,本發明較佳實施例所提供之錯誤偵測方法便會結束。When the parity detection module 111, the system error detection module 112 and the correction memory detection module 113 have not detected the processing chip 11 to generate the above parity error information, system error information and correction memory error information Any one of them, it means that the intelligent network card system 1 with error detection function can operate normally, and the error detection method provided by the preferred embodiment of the present invention will end.

綜上所述,本發明所提供之具有錯誤偵測功能的智能網卡系統及錯誤偵測方法,利用奇偶校驗偵測模組、系統錯誤偵測模組、修正記憶體偵測模組分別偵測處理晶片的奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊,利用奇偶校驗儲存單元、系統錯誤儲存單元與修正記憶體儲存單元分別儲存奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊,相較於先前技術,本發明可以正常並穩定的記錄會導致處理晶片故障無法正常運作的奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊,並可以將上述錯誤資訊傳送至主機系統的基板管理控制器,且可以相對於主機系統單獨重新啟動,使錯誤儲存模組清除上述錯誤資訊,直到奇偶校驗偵測模組、系統錯誤偵測模組與修正記憶體偵測模組沒有偵測到上述錯誤資訊後,具有錯誤偵測功能的智能網卡系統便能正常運作。To sum up, the intelligent network card system and error detection method with error detection function provided by the present invention use the parity check detection module, the system error detection module, and the corrected memory detection module to respectively detect The parity check error information, system error information and correction memory error information of the processing chip are measured, and the parity check storage unit, the system error storage unit and the correction memory storage unit are used to respectively store the parity check error information, system error information and correction Memory error information, compared with the prior art, the present invention can normally and stably record the parity check error information, system error information and correct memory error information that will cause the processing chip failure to work normally, and can store the above error information sent to the baseboard management controller of the host system, and can be restarted independently relative to the host system, so that the error storage module clears the above error information until the parity check detection module, system error detection module and correction memory detection module After the test module fails to detect the above error information, the intelligent network card system with error detection function can operate normally.

另外,若奇偶校驗偵測模組、系統錯誤偵測模組與修正記憶體偵測模組持續偵測到上述錯誤資訊後,本發明較佳實施例所提供之具有錯誤偵測功能的智能網卡系統及錯誤偵測方法便會重複上述偵測、傳送、儲存、重新啟動等步驟,直到奇偶校驗偵測模組、系統錯誤偵測模組與修正記憶體偵測模組沒有偵測到上述錯誤資訊後,使具有錯誤偵測功能的智能網卡系統能正常運作。In addition, if the parity detection module, the system error detection module and the correction memory detection module continue to detect the above error information, the smart device with error detection function provided by the preferred embodiment of the present invention The network card system and the error detection method will repeat the steps of detection, transmission, storage, and restart until the parity detection module, the system error detection module and the correction memory detection module do not detect After the above error information is received, the iNIC system with error detection function can operate normally.

藉由以上較佳具體實施例之詳述,係希望能更加清楚描述本發明之特徵與精神,而並非以上述所揭露的較佳具體實施例來對本發明之範疇加以限制。相反地,其目的是希望能涵蓋各種改變及具相等性的安排於本發明所欲申請之專利範圍的範疇內。Through the above detailed description of the preferred embodiments, it is hoped that the characteristics and spirit of the present invention can be described more clearly, and the scope of the present invention is not limited by the preferred embodiments disclosed above. On the contrary, the intention is to cover various changes and equivalent arrangements within the scope of the patent application for the present invention.

1:具有錯誤偵測功能的智能網卡系統 11:處理晶片 111:奇偶校驗偵測模組 112:系統錯誤偵測模組 113:修正記憶體偵測模組 114:第一通訊模組 12:複雜可程式化邏輯裝置 121:接收模組 122:錯誤儲存模組 1221:奇偶校驗儲存單元 1222:系統錯誤儲存單元 1223:修正記憶體儲存單元 123:第二通訊模組 2:主機系統 21:基板管理控制器 1: Intelligent network card system with error detection function 11: Process Wafer 111: Parity detection module 112: System error detection module 113:Fix memory detection module 114: The first communication module 12:Complex programmable logic device 121: Receiving module 122: Error storage module 1221: parity storage unit 1222: System error storage unit 1223: Fix memory storage unit 123: Second communication module 2: Host system 21: Baseboard Management Controller

第一圖係顯示本發明較佳實施例所提供之具有錯誤偵測功能的智能網卡系統之方塊圖;以及 第二圖係顯示本發明較佳實施例所提供之錯誤偵測方法之流程圖。 The first figure shows the block diagram of the intelligent network card system with error detection function provided by the preferred embodiment of the present invention; and The second figure is a flowchart showing the error detection method provided by the preferred embodiment of the present invention.

1:具有錯誤偵測功能的智能網卡系統 1: Intelligent network card system with error detection function

11:處理晶片 11: Process Wafer

111:奇偶校驗偵測模組 111: Parity detection module

112:系統錯誤偵測模組 112: System error detection module

113:修正記憶體偵測模組 113:Fix memory detection module

114:第一通訊模組 114: The first communication module

12:複雜可程式化邏輯裝置 12:Complex programmable logic device

121:接收模組 121: Receiving module

122:錯誤儲存模組 122: Error storage module

1221:奇偶校驗儲存單元 1221: parity storage unit

1222:系統錯誤儲存單元 1222: System error storage unit

1223:修正記憶體儲存單元 1223: Fix memory storage unit

123:第二通訊模組 123: Second communication module

2:主機系統 2: Host system

21:基板管理控制器 21: Baseboard Management Controller

Claims (5)

一種具有錯誤偵測功能的智能網卡系統,係外接於一主機系統,並包含: 一處理晶片,包含: 一奇偶校驗偵測模組,係用以偵測該處理晶片之一奇偶校驗錯誤資訊; 一系統錯誤偵測模組,係用以偵測該處理晶片之一系統錯誤資訊; 一修正記憶體偵測模組,係用以偵測該處理晶片之一修正記憶體錯誤資訊;以及 一第一通訊模組,係電性連接該奇偶校驗偵測模組、該系統錯誤偵測模組與該修正記憶體偵測模組,用以在偵測出該奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體偵測模組中之至少一者時,傳送出上述該奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體偵測模組中之至少一者;以及 一複雜可程式化邏輯裝置,係電性連接該處理晶片,包含: 一接收模組,係電性連接該第一通訊模組,用以接收上述該奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體偵測模組中之至少一者: 一錯誤儲存模組,係電性連接該接收模組,並包含: 一奇偶校驗儲存單元,係用以儲存該奇偶校驗錯誤資訊; 一系統錯誤儲存單元,係用以儲存該系統錯誤資訊;以及 一修正記憶體儲存單元,係用以儲存該修正記憶體錯誤資訊;以及 一第二通訊模組,係電性連接該錯誤儲存模組,用以將上述該奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體偵測模組中之至少一者傳送至該主機系統之一基板管理控制器; 其中,該基板管理控制器接收並儲存上述該奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體偵測模組中之至少一者後,係傳送一清除信號,使該智能網卡系統受操作地單獨重新啟動之後,該錯誤儲存模組係清除上述該奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體偵測模組中之至少一者,藉以使該智能網卡系統正常運作。 An intelligent network card system with error detection function is externally connected to a host system and includes: A processed wafer, comprising: a parity detection module, which is used to detect a parity error information of the processing chip; A system error detection module is used to detect system error information of the processing chip; a correction memory detection module for detecting error information of a correction memory of the processing chip; and A first communication module is electrically connected to the parity detection module, the system error detection module and the correction memory detection module for detecting the parity error information, When at least one of the system error information and the modified memory detection module is used, at least one of the above-mentioned parity error information, the system error information and the modified memory detection module is sent; and A complex programmable logic device electrically connected to the processing chip, comprising: A receiving module, electrically connected to the first communication module, for receiving at least one of the parity error information, the system error information and the corrected memory detection module: An error storage module is electrically connected to the receiving module and includes: a parity storage unit for storing the parity error information; a system error storage unit for storing the system error information; and a correction memory storage unit for storing the correction memory error information; and A second communication module, electrically connected to the error storage module, for transmitting at least one of the above-mentioned parity error information, the system error information and the corrected memory detection module to the host One of the systems is a baseboard management controller; Wherein, after the baseboard management controller receives and stores at least one of the above-mentioned parity error information, the system error information, and the corrected memory detection module, it sends a clear signal to enable the smart network card system to be protected. After operatively restarting independently, the error storage module clears at least one of the above-mentioned parity error information, the system error information and the corrected memory detection module, so as to make the smart network card system operate normally. 如請求項1所述之具有錯誤偵測功能的智能網卡系統,其中,該第一通訊模組係利用一串列通用型輸入輸出(Serial General Purpose Input/Output;SGPIO)介面電性連接該接收模組。The intelligent network card system with error detection function as described in claim 1, wherein the first communication module is electrically connected to the receiver by using a Serial General Purpose Input/Output (SGPIO) interface mod. 如請求項1所述之具有錯誤偵測功能的智能網卡系統,其中,該第二通訊模組係一I2C模組。The intelligent network card system with error detection function as described in claim 1, wherein the second communication module is an I2C module. 如請求項1所述之具有錯誤偵測功能的智能網卡系統,其中,該第二通訊模組係利用一電源管理匯流排(Power Management Bus;PMBUS)電性連接該主機系統之該基板管理控制器。The smart network card system with error detection function as described in claim 1, wherein the second communication module is electrically connected to the baseboard management control of the host system by using a Power Management Bus (PMBUS) device. 一種錯誤偵測方法,係利用如請求項1所述之具有錯誤偵測功能的智能網卡系統加以實施,並包含以下步驟: (a) 利用該奇偶校驗偵測模組、該系統錯誤偵測模組與該修正記憶體偵測模組偵測出上述該奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體錯誤資訊中之至少一者; (b) 利用該第一通訊模組傳送出上述奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體錯誤資訊中之至少一者; (c) 利用該接收模組接收上述奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體錯誤資訊中之至少一者; (d) 利用該錯誤儲存模組儲存上述奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體錯誤資訊中之至少一者; (e) 利用該第二通訊模組將上述奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體錯誤資訊中之至少一者傳送至該基板管理控制器; (f) 使該智能網卡系統單獨重新啟動; (g) 利用該錯誤儲存模組接收該清除信號,並清除上述該奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體偵測模組中之至少一者。 An error detection method is implemented by using the intelligent network card system with error detection function as described in claim 1, and includes the following steps: (a) Using the parity detection module, the system error detection module and the correction memory detection module to detect the parity error information, the system error information and the correction memory error at least one of the information; (b) using the first communication module to transmit at least one of the above-mentioned parity error information, the system error information and the corrected memory error information; (c) using the receiving module to receive at least one of the above-mentioned parity error information, the system error information and the corrected memory error information; (d) use the error storage module to store at least one of the above-mentioned parity error information, the system error information and the corrected memory error information; (e) using the second communication module to transmit at least one of the parity error information, the system error information and the corrected memory error information to the baseboard management controller; (f) cause the iNIC system to reboot individually; (g) using the error storage module to receive the clear signal, and clear at least one of the above-mentioned parity error information, the system error information and the correction memory detection module.
TW110108947A 2021-03-12 2021-03-12 Smart network interface controller system and method of detecting error TWI738627B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW110108947A TWI738627B (en) 2021-03-12 2021-03-12 Smart network interface controller system and method of detecting error

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW110108947A TWI738627B (en) 2021-03-12 2021-03-12 Smart network interface controller system and method of detecting error

Publications (2)

Publication Number Publication Date
TWI738627B TWI738627B (en) 2021-09-01
TW202236274A true TW202236274A (en) 2022-09-16

Family

ID=78777920

Family Applications (1)

Application Number Title Priority Date Filing Date
TW110108947A TWI738627B (en) 2021-03-12 2021-03-12 Smart network interface controller system and method of detecting error

Country Status (1)

Country Link
TW (1) TWI738627B (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10678721B1 (en) * 2017-02-02 2020-06-09 Amazon Technologies, Inc. Communication link testing
US11593138B2 (en) * 2019-05-20 2023-02-28 Microsoft Technology Licensing, Llc Server offload card with SoC and FPGA
US11074124B2 (en) * 2019-07-23 2021-07-27 Alibaba Group Holding Limited Method and system for enhancing throughput of big data analysis in a NAND-based read source storage

Also Published As

Publication number Publication date
TWI738627B (en) 2021-09-01

Similar Documents

Publication Publication Date Title
JP5142138B2 (en) Method and memory system for identifying faulty memory elements in a memory system
US8495328B2 (en) Providing frame start indication in a memory system having indeterminate read data latency
US9436548B2 (en) ECC bypass using low latency CE correction with retry select signal
US7984357B2 (en) Implementing minimized latency and maximized reliability when data traverses multiple buses
US8667372B2 (en) Memory controller and method of controlling memory
KR102399843B1 (en) Error correction hardware with fault detection
KR102378466B1 (en) Memory devices and modules
US11513892B2 (en) System and method for using a directory to recover a coherent system from an uncorrectable error
EP2409231A1 (en) Fault tolerance in integrated circuits
US20070294574A1 (en) Dual computer for system backup and being fault-tolerant
KR20110003726A (en) Crc mamagement method for sata interface and data storage device thereof
TWI738627B (en) Smart network interface controller system and method of detecting error
US20050204193A1 (en) Dynamic interconnect width reduction to improve interconnect availability
CN113037507B (en) Intelligent network card system with error detection function and error detection method
US10740179B2 (en) Memory and method for operating the memory
JP2001007893A (en) Information processing system and fault processing system used for it
TWI757606B (en) Server device and communication method between baseboard management controller and programmable logic unit thereof
US7251753B2 (en) Apparatus, system, and method for identifying a faulty communication module
TWI764342B (en) Startup status detection system and method thereof
CN112346922B (en) Server device and communication protocol method thereof
TWI767378B (en) Error type determination system and method thereof
CN107451035B (en) Error state data providing method for computer device
CN112084049A (en) Method for monitoring resident program of baseboard management controller