TWI738627B - Smart network interface controller system and method of detecting error - Google Patents

Smart network interface controller system and method of detecting error Download PDF

Info

Publication number
TWI738627B
TWI738627B TW110108947A TW110108947A TWI738627B TW I738627 B TWI738627 B TW I738627B TW 110108947 A TW110108947 A TW 110108947A TW 110108947 A TW110108947 A TW 110108947A TW I738627 B TWI738627 B TW I738627B
Authority
TW
Taiwan
Prior art keywords
error information
error
module
parity
detection module
Prior art date
Application number
TW110108947A
Other languages
Chinese (zh)
Other versions
TW202236274A (en
Inventor
劉葉
Original Assignee
英業達股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 英業達股份有限公司 filed Critical 英業達股份有限公司
Priority to TW110108947A priority Critical patent/TWI738627B/en
Application granted granted Critical
Publication of TWI738627B publication Critical patent/TWI738627B/en
Publication of TW202236274A publication Critical patent/TW202236274A/en

Links

Images

Landscapes

  • Detection And Correction Of Errors (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

A method of detecting method is disclosed in the present invention. In the method, a SoC(System on a Chip) is utilized to detect a parity error, a system error and a multi bit ECC memory error; then a CPLD(Complex Programmable Logic Device) is utilized to save and transmit the above error to a BMC(Baseboard Management Controller); and then a smart network interface controller system is operated to re-boot; finally, the smart network interface controller system starts working when the SoC does not to detect any error. The smart network interface controller system is also disclosed in the present invention.

Description

具有錯誤偵測功能的智能網卡系統及錯誤偵測方法Intelligent network card system with error detection function and error detection method

本發明係有關於一種系統及方法,尤其是指一種具有錯誤偵測功能的智能網卡系統及錯誤偵測方法。The present invention relates to a system and method, in particular to an intelligent network card system with error detection function and an error detection method.

網卡是接入網路進行通信的必備設備,並且與主板上的中央處理單元(Central Processing Unit;CPU)配合而完成整個網路協議中各層的處理。隨著科技的發展,智慧網卡也逐漸普及。一般來說,相較於傳統網卡,智慧網卡除了可以收發資料外,同時還具備高性能及可編成的運算能力。然而智慧網卡上的系統晶片若發生奇偶校驗錯誤(Parity Error;PERR)、系統錯誤(System Error;SERR)或修正記憶體錯誤(Multi Bit ECC Memory Error),則會造成系統晶片故障,導致智慧網卡無法正常運作。因此,先前技術存在改善的空間。The network card is an indispensable device for connecting to the network for communication, and cooperates with the central processing unit (CPU) on the motherboard to complete the processing of all layers in the entire network protocol. With the development of technology, smart network cards have gradually become popular. Generally speaking, compared with traditional network cards, smart network cards can not only send and receive data, but also have high performance and programmable computing capabilities. However, if the system chip on the smart network card has a parity error (PERR), system error (SERR), or multi-bit ECC memory error, it will cause the system chip to malfunction and lead to smart The network card is not working properly. Therefore, there is room for improvement in the prior art.

有鑒於在先前技術中,奇偶校驗錯誤、系統錯誤或修正記憶體錯誤所造成智慧網卡無法正常運作所衍生出的種種問題。本發明之一主要目的係提供一種具有錯誤偵測功能的智能網卡系統,用以解決先前技術中的至少一個問題。In view of the various problems derived from the inability of the smart network card to operate normally due to parity errors, system errors, or memory errors in the prior art. One of the main objectives of the present invention is to provide a smart network card system with error detection function to solve at least one problem in the prior art.

本發明為解決先前技術之問題,所採用之必要技術手段為提供一種具有錯誤偵測功能的智能網卡系統,係外接於一主機系統,包含一處理晶片與一複雜可程式化邏輯裝置。處理晶片包含一奇偶校驗偵測模組、一系統錯誤偵測模組、一修正記憶體偵測模組與一第一通訊模組。奇偶校驗偵測模組用以偵測處理晶片之一奇偶校驗錯誤資訊。系統錯誤偵測模組用以偵測處理晶片之一系統錯誤資訊。修正記憶體偵測模組用以偵測處理晶片之一修正記憶體錯誤資訊。第一通訊模組電性連接奇偶校驗偵測模組、系統錯誤偵測模組與修正記憶體偵測模組,用以在偵測出奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體偵測模組中之至少一者時,傳送出上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體偵測模組中之至少一者。In order to solve the problems of the prior art, the necessary technical means adopted by the present invention are to provide an intelligent network card system with error detection function, which is externally connected to a host system and includes a processing chip and a complex programmable logic device. The processing chip includes a parity check detection module, a system error detection module, a correction memory detection module, and a first communication module. The parity detection module is used for detecting parity error information of one of the processing chips. The system error detection module is used for detecting system error information of one of the processing chips. The correction memory detection module is used to detect one of the processing chips and correct memory error information. The first communication module is electrically connected to the parity detection module, the system error detection module, and the correction memory detection module for detecting parity error information, system error information, and correction memory When detecting at least one of the modules, at least one of the above-mentioned parity error information, system error information, and correction memory detection module is sent.

複雜可程式化邏輯裝置電性連接處理晶片,並包含一接收模組、一錯誤儲存模組與一第二通訊模組。接收模組電性連接第一通訊模組,用以接收上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體偵測模組中之至少一者。錯誤儲存模組電性連接接收模組,並包含一奇偶校驗儲存單元、一系統錯誤儲存單元與一修正記憶體儲存單元。奇偶校驗儲存單元用以儲存奇偶校驗錯誤資訊。系統錯誤儲存單元用以儲存系統錯誤資訊。修正記憶體儲存單元用以儲存修正記憶體錯誤資訊。第二通訊模組電性連接錯誤儲存模組,用以將上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體偵測模組中之至少一者傳送至主機系統之一基板管理控制器。The complex programmable logic device is electrically connected to the processing chip, and includes a receiving module, an error storage module and a second communication module. The receiving module is electrically connected to the first communication module for receiving at least one of the above-mentioned parity error information, system error information, and correction memory detection module. The error storage module is electrically connected to the receiving module, and includes a parity storage unit, a system error storage unit, and a correction memory storage unit. The parity storage unit is used for storing parity error information. The system error storage unit is used for storing system error information. The modified memory storage unit is used to store and modify memory error information. The second communication module is electrically connected to the error storage module for transmitting at least one of the parity error information, system error information, and correction memory detection module to a baseboard management controller of the host system.

其中,基板管理控制器接收並儲存上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體偵測模組中之至少一者後,係傳送一清除信號,使智能網卡系統受操作地單獨重新啟動之後,錯誤儲存模組清除上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體偵測模組中之至少一者,藉以使智能網卡系統正常運作。After receiving and storing at least one of the parity error information, system error information, and correction memory detection module, the baseboard management controller sends a clear signal to enable the smart network card system to be operated and restarted separately After that, the error storage module clears at least one of the parity error information, system error information, and correction memory detection module, so that the smart network card system can operate normally.

在上述必要技術手段的基礎下,本發明所衍生之一附屬技術手段為使具有錯誤偵測功能的智能網卡系統中之第一通訊模組,係利用一串列通用型輸入輸出(Serial General Purpose Input/Output;SGPIO)介面電性連接接收模組。On the basis of the above-mentioned necessary technical means, an auxiliary technical means derived from the present invention is to make the first communication module in the intelligent network card system with error detection function use a serial general purpose input and output (Serial General Purpose Input and Output). Input/Output; SGPIO) interface is electrically connected to the receiving module.

在上述必要技術手段的基礎下,本發明所衍生之一附屬技術手段為使具有錯誤偵測功能的智能網卡系統中之第二通訊模組,係一I2C模組。On the basis of the above-mentioned necessary technical means, an auxiliary technical means derived from the present invention is that the second communication module in the intelligent network card system with error detection function is an I2C module.

在上述必要技術手段的基礎下,本發明所衍生之一附屬技術手段為使具有錯誤偵測功能的智能網卡系統中之第二通訊模組,係利用一電源管理匯流排(Power Management Bus;PMBUS)電性連接主機系統之基板管理控制器。On the basis of the above-mentioned necessary technical means, an auxiliary technical means derived from the present invention is to make the second communication module in the intelligent network card system with error detection function use a Power Management Bus (PMBUS). ) It is electrically connected to the baseboard management controller of the host system.

本發明為解決先前技術之問題,所採用之必要技術手段為另外提供一種錯誤偵測方法,係利用如上述之具有錯誤偵測功能的智能網卡系統加以實施,並包含以下步驟:利用奇偶校驗偵測模組、系統錯誤偵測模組與修正記憶體偵測模組偵測出上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊中之至少一者;利用第一通訊模組傳送出上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊中之至少一者;利用接收模組接收上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊中之至少一者;利用錯誤儲存模組儲存上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊中之至少一者;利用第二通訊模組將上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊中之至少一者傳送至基板管理控制器;使智能網卡系統單獨重新啟動;利用錯誤儲存模組接收清除信號,並清除上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體偵測模組中之至少一者。In order to solve the problems of the prior art, the necessary technical means adopted by the present invention is to provide another error detection method, which is implemented by using the smart network card system with error detection function as described above, and includes the following steps: using parity check The detection module, the system error detection module, and the corrected memory detection module detect at least one of the above-mentioned parity error information, system error information, and corrected memory error information; use the first communication module Send out at least one of the aforementioned parity error information, system error information, and corrected memory error information; use the receiving module to receive at least one of the aforementioned parity error information, system error information, and corrected memory error information ; Use the error storage module to store at least one of the above-mentioned parity error information, system error information and corrected memory error information; use the second communication module to store the above-mentioned parity error information, system error information and corrected memory At least one of the error information is sent to the baseboard management controller; the smart network card system is restarted separately; the error storage module is used to receive the clear signal, and clear the above-mentioned parity error information, system error information and correction memory detection mode At least one of the group.

承上所述,本發明所提供之具有錯誤偵測功能的智能網卡系統及錯誤偵測方法,利用奇偶校驗偵測模組、系統錯誤偵測模組、修正記憶體偵測模組分別偵測處理晶片的奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊,利用奇偶校驗儲存單元、系統錯誤儲存單元與修正記憶體儲存單元分別儲存奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊,相較於先前技術,本發明可以正常並穩定的記錄會導致處理晶片故障無法正常運作的奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊,並可以將上述錯誤資訊傳送至主機系統的基板管理控制器,且可以相對於主機系統單獨重新啟動,使錯誤儲存模組清除上述錯誤資訊,直到奇偶校驗偵測模組、系統錯誤偵測模組與修正記憶體偵測模組沒有偵測到上述錯誤資訊後,具有錯誤偵測功能的智能網卡系統便能正常運作。As mentioned above, the smart network card system with error detection function and error detection method provided by the present invention utilize the parity detection module, the system error detection module, and the correction memory detection module to detect separately Test the parity error information, system error information and correction memory error information of the processing chip, and use the parity storage unit, system error storage unit and correction memory storage unit to store parity error information, system error information and corrections respectively Memory error information. Compared with the prior art, the present invention can normally and stably record parity error information, system error information, and correct memory error information that would cause chip failures to work normally. It is sent to the baseboard management controller of the host system, and can be restarted separately from the host system, so that the error storage module clears the above error information until the parity detection module, the system error detection module and the correction memory detection After the test module does not detect the above error information, the smart network card system with error detection function can operate normally.

下面將結合示意圖對本發明的具體實施方式進行更詳細的描述。根據下列描述和申請專利範圍,本發明的優點和特徵將更清楚。需說明的是,圖式均採用非常簡化的形式且均使用非精準的比例,僅用以方便、明晰地輔助說明本發明實施例的目的。The specific embodiments of the present invention will be described in more detail below in conjunction with the schematic diagrams. According to the following description and the scope of patent application, the advantages and features of the present invention will be more clear. It should be noted that the drawings all adopt a very simplified form and all use imprecise proportions, which are only used to conveniently and clearly assist in explaining the purpose of the embodiments of the present invention.

請參閱第一圖與第二圖,其中,第一圖係顯示本發明較佳實施例所提供之具有錯誤偵測功能的智能網卡系統之方塊圖;以及,第二圖係顯示本發明較佳實施例所提供之錯誤偵測方法之流程圖。如圖所示,一種錯誤偵測方法係利用一種具有錯誤偵測功能的智能網卡系統1加以實施,並包含步驟S101至步驟S108。Please refer to the first and second figures. The first figure shows the block diagram of the smart network card system with error detection function provided by the preferred embodiment of the present invention; and the second figure shows the preferred embodiment of the present invention. The flow chart of the error detection method provided by the embodiment. As shown in the figure, an error detection method is implemented using a smart network card system 1 with an error detection function, and includes steps S101 to S108.

具有錯誤偵測功能的智能網卡系統1外接於一主機系統2,並包含一處理晶片11與一複雜可程式化邏輯裝置(Complex Programmable Logic Device;CPLD)12。The intelligent network card system 1 with error detection function is externally connected to a host system 2 and includes a processing chip 11 and a complex programmable logic device (CPLD) 12.

處理晶片11包含一奇偶校驗偵測模組111、一系統錯誤偵測模組112、一修正記憶體偵測模組113與一第一通訊模組114。其中,處理晶片11係一系統單晶片(System on a Chip;SoC)。The processing chip 11 includes a parity detection module 111, a system error detection module 112, a correction memory detection module 113, and a first communication module 114. Among them, the processing chip 11 is a system on a chip (System on a Chip; SoC).

複雜可程式化邏輯裝置12電性連接處理晶片11,並包含一接收模組121、一錯誤儲存模組122與一第二通訊模組123。錯誤儲存模組122包含一奇偶校驗儲存單元1221、一系統錯誤儲存單元1222與一修正記憶體儲存單元1223。The complex programmable logic device 12 is electrically connected to the processing chip 11 and includes a receiving module 121, an error storage module 122, and a second communication module 123. The error storage module 122 includes a parity storage unit 1221, a system error storage unit 1222, and a correction memory storage unit 1223.

步驟S101:奇偶校驗偵測模組、系統錯誤偵測模組與修正記憶體偵測模組是否偵測出奇偶校驗錯誤資訊、系統錯誤資訊或修正記憶體錯誤資訊。Step S101: Whether the parity detection module, the system error detection module, and the correction memory detection module detect parity error information, system error information or correct memory error information.

奇偶校驗偵測模組111用以偵測處理晶片11的一奇偶校驗錯誤(Parity Error;PERR)資訊。系統錯誤偵測模組112用以偵測處理晶片11的一系統錯誤(System Error;SERR)資訊。修正記憶體偵測模組113用以偵測處理晶片11的一修正記憶體錯誤(Multi Bit ECC Memory Error)資訊。上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊可以統稱為錯誤資訊,若處理晶片11產生上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊的任何一者,則會導致處理晶片11故障無法正常運作。實際操作上,處理晶片11也可能產生上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊的至少二者。The parity detection module 111 is used for detecting a parity error (PERR) information of the processing chip 11. The system error detection module 112 is used for detecting a system error (SERR) information of the processing chip 11. The modified memory detection module 113 is used to detect a Multi Bit ECC Memory Error (Multi Bit ECC Memory Error) information of the processing chip 11. The above parity error information, system error information, and corrected memory error information can be collectively referred to as error information. If the processing chip 11 generates any of the above parity error information, system error information, and corrected memory error information, it will As a result, the processing chip 11 fails to operate normally. In actual operation, the processing chip 11 may also generate at least two of the above-mentioned parity error information, system error information, and correction memory error information.

因此,奇偶校驗偵測模組111、系統錯誤偵測模組112與修正記憶體偵測模組113會偵測處理晶片11是否產生上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊的任何一者。在奇偶校驗偵測模組111、系統錯誤偵測模組112與修正記憶體偵測模組113偵測出上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊的至少一者時,則會進入步驟S102。Therefore, the parity detection module 111, the system error detection module 112, and the correction memory detection module 113 will detect whether the processing chip 11 generates the aforementioned parity error information, system error information, and correct memory errors. Any one of information. When the parity detection module 111, the system error detection module 112, and the corrected memory detection module 113 detect at least one of the above-mentioned parity error information, system error information, and corrected memory error information , It will go to step S102.

步驟S102:利用第一通訊模組傳送出上述錯誤資訊。Step S102: Use the first communication module to transmit the above-mentioned error information.

第一通訊模組114電性連接奇偶校驗偵測模組111、系統錯誤偵測模組112與修正記憶體偵測模組113,用以接收並傳送出上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊的至少一者。The first communication module 114 is electrically connected to the parity detection module 111, the system error detection module 112, and the correction memory detection module 113 for receiving and transmitting the aforementioned parity error information and system errors At least one of information and correction memory error information.

舉例說明,當只有奇偶校驗偵測模組111偵測出奇偶校驗錯誤資訊,而系統錯誤偵測模組112與修正記憶體偵測模組113沒有偵測出系統錯誤資訊與修正記憶體錯誤資訊時,第一通訊模組114僅會傳送出奇偶校驗錯誤資訊。當奇偶校驗偵測模組111與系統錯誤偵測模組112分別偵測出奇偶校驗錯誤資訊與系統錯誤資訊,而修正記憶體偵測模組113沒有偵測出修正記憶體錯誤資訊時,第一通訊模組114便會傳送出奇偶校驗錯誤資訊與系統錯誤資訊。For example, when only the parity detection module 111 detects the parity error information, but the system error detection module 112 and the corrected memory detection module 113 do not detect the system error information and correct the memory In the case of error information, the first communication module 114 only transmits parity error information. When the parity detection module 111 and the system error detection module 112 respectively detect the parity error information and the system error information, and the corrected memory detection module 113 does not detect the corrected memory error information , The first communication module 114 will send out parity error information and system error information.

步驟S103:利用接收模組接收上述錯誤資訊。Step S103: Use the receiving module to receive the above-mentioned error information.

複雜可程式化邏輯裝置12的接收模組121電性連接第一通訊模組114,用以接收上述錯誤資訊。如上所述,第一通訊模組114傳送出哪種錯誤資訊,接收模組121就會接收到哪種錯誤資訊。實際操作上,第一通訊模組114可利用一串列通用型輸入輸出(Serial General Purpose Input/Output;SGPIO)介面電性連接接收模組121,但不以此為限。The receiving module 121 of the complex programmable logic device 12 is electrically connected to the first communication module 114 for receiving the above-mentioned error information. As mentioned above, what kind of error information is transmitted by the first communication module 114, and which kind of error information will be received by the receiving module 121. In actual operation, the first communication module 114 may be electrically connected to the receiving module 121 through a serial general purpose input/output (Serial General Purpose Input/Output; SGPIO) interface, but it is not limited to this.

步驟S104:利用錯誤儲存模組儲存上述錯誤資訊。Step S104: Use the error storage module to store the above-mentioned error information.

複雜可程式化邏輯裝置12的錯誤儲存模組122電性連接接收模組121,用以儲存上述錯誤資訊,並包含一奇偶校驗儲存單元1221、一系統錯誤儲存單元1222與一修正記憶體儲存單元1223。如上所述,接收模組121接收到哪種錯誤資訊,錯誤儲存模組122就會儲存哪種錯誤資訊。而奇偶校驗儲存單元1221是用以儲存奇偶校驗錯誤資訊,系統錯誤儲存單元1222用以儲存系統錯誤資訊,修正記憶體儲存單元1223則是用以儲存修正記憶體錯誤資訊。The error storage module 122 of the complex programmable logic device 12 is electrically connected to the receiving module 121 for storing the above-mentioned error information, and includes a parity storage unit 1221, a system error storage unit 1222, and a correction memory storage Unit 1223. As mentioned above, which kind of error information is received by the receiving module 121, and which kind of error information will be stored by the error storage module 122. The parity storage unit 1221 is used to store parity error information, the system error storage unit 1222 is used to store system error information, and the correction memory storage unit 1223 is used to store correction memory error information.

步驟S105:利用第二通訊模組將上述錯誤資訊傳送至基板管理控制器。Step S105: Use the second communication module to transmit the above-mentioned error information to the baseboard management controller.

複雜可程式化邏輯裝置12的第二通訊模組123電性連接錯誤儲存模組122,用以將上述錯誤資訊傳送至主機系統2的一基板管理控制器(Baseboard Management Controller;BMC)21。如上所述,錯誤儲存模組122儲存哪種錯誤資訊,第二通訊模組123便會傳送哪種錯誤資訊。The second communication module 123 of the complex programmable logic device 12 is electrically connected to the error storage module 122 for transmitting the error information to a baseboard management controller (BMC) 21 of the host system 2. As described above, what kind of error information is stored by the error storage module 122, and what kind of error information will be transmitted by the second communication module 123.

步驟S106:利用基板管理控制器儲存上述錯誤資訊,並傳送清除信號。Step S106: Utilize the baseboard management controller to store the above-mentioned error information and send a clear signal.

基板管理控制器21會儲存上述錯誤資訊,並且得知具有錯誤偵測功能的智能網卡系統1處於一異常狀態,故會產生並傳送一清除信號。基板管理控制器21儲存上述錯誤資訊也可供一使用者自主機系統2觀察並得知具有錯誤偵測功能的智能網卡系統1的狀態以及導致異常狀態的錯誤資訊種類。The baseboard management controller 21 stores the above-mentioned error information and knows that the smart network card system 1 with error detection function is in an abnormal state, so it generates and transmits a clear signal. The storage of the above-mentioned error information by the baseboard management controller 21 also allows a user to observe and learn from the host system 2 the state of the smart network card system 1 with an error detection function and the type of error information that caused the abnormal state.

步驟S107:使智能網卡系統單獨重新啟動。Step S107: Restart the smart network card system separately.

因為具有錯誤偵測功能的智能網卡系統1係外接於主機系統2,因此,具有錯誤偵測功能的智能網卡系統1係受操作地相較於主機系統2而單獨重新啟動。需說明的是,具有錯誤偵測功能的智能網卡系統1單獨重新啟動,並不會影響到主機系統2,因此主機系統2仍然可以正常運作。Because the smart network card system 1 with error detection function is externally connected to the host system 2, the smart network card system 1 with error detection function is operated separately from the host system 2 and restarts. It should be noted that the smart network card system 1 with the error detection function is restarted separately, and the host system 2 will not be affected, so the host system 2 can still operate normally.

步驟S108:利用錯誤儲存模組接收清除信號,並清除上述錯誤資訊。Step S108: Use the error storage module to receive the clear signal, and clear the above-mentioned error information.

複雜可程式化邏輯裝置12的錯誤儲存模組122接收清除信號後,便會清除上述錯誤資訊。After the error storage module 122 of the complex programmable logic device 12 receives the clear signal, it clears the above-mentioned error information.

上述步驟S108會接回步驟S101,此時,會再利用奇偶校驗偵測模組111、系統錯誤偵測模組112與修正記憶體偵測模組113偵測處理晶片11是否產生上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊的至少一者。若結果仍為是,則會再次進行上述步驟S102至S108。The above step S108 will be followed by step S101. At this time, the parity detection module 111, the system error detection module 112 and the correction memory detection module 113 will be used to detect whether the processing chip 11 generates the above parity. Verify at least one of error information, system error information, and corrected memory error information. If the result is still yes, the above steps S102 to S108 will be performed again.

當奇偶校驗偵測模組111、系統錯誤偵測模組112與修正記憶體偵測模組113都未偵測處理晶片11產生上述奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊的任何一者時,表示具有錯誤偵測功能的智能網卡系統1可以正常運作,本發明較佳實施例所提供之錯誤偵測方法便會結束。When the parity detection module 111, the system error detection module 112, and the correction memory detection module 113 do not detect the processing chip 11, the above parity error information, system error information, and correction memory error information are generated. Any of the above means that the smart network card system 1 with error detection function can operate normally, and the error detection method provided by the preferred embodiment of the present invention will end.

綜上所述,本發明所提供之具有錯誤偵測功能的智能網卡系統及錯誤偵測方法,利用奇偶校驗偵測模組、系統錯誤偵測模組、修正記憶體偵測模組分別偵測處理晶片的奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊,利用奇偶校驗儲存單元、系統錯誤儲存單元與修正記憶體儲存單元分別儲存奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊,相較於先前技術,本發明可以正常並穩定的記錄會導致處理晶片故障無法正常運作的奇偶校驗錯誤資訊、系統錯誤資訊與修正記憶體錯誤資訊,並可以將上述錯誤資訊傳送至主機系統的基板管理控制器,且可以相對於主機系統單獨重新啟動,使錯誤儲存模組清除上述錯誤資訊,直到奇偶校驗偵測模組、系統錯誤偵測模組與修正記憶體偵測模組沒有偵測到上述錯誤資訊後,具有錯誤偵測功能的智能網卡系統便能正常運作。In summary, the intelligent network card system with error detection function and error detection method provided by the present invention utilize the parity detection module, the system error detection module, and the correction memory detection module to detect separately Test the parity error information, system error information and correction memory error information of the processing chip, and use the parity storage unit, system error storage unit and correction memory storage unit to store parity error information, system error information and corrections respectively Memory error information. Compared with the prior art, the present invention can normally and stably record parity error information, system error information, and correct memory error information that would cause chip failures to work normally. It is sent to the baseboard management controller of the host system, and can be restarted separately from the host system, so that the error storage module clears the above error information until the parity detection module, system error detection module, and correction memory detection After the test module does not detect the above error information, the smart network card system with error detection function can operate normally.

另外,若奇偶校驗偵測模組、系統錯誤偵測模組與修正記憶體偵測模組持續偵測到上述錯誤資訊後,本發明較佳實施例所提供之具有錯誤偵測功能的智能網卡系統及錯誤偵測方法便會重複上述偵測、傳送、儲存、重新啟動等步驟,直到奇偶校驗偵測模組、系統錯誤偵測模組與修正記憶體偵測模組沒有偵測到上述錯誤資訊後,使具有錯誤偵測功能的智能網卡系統能正常運作。In addition, if the parity detection module, the system error detection module, and the correction memory detection module continue to detect the above-mentioned error information, the intelligent error detection function provided by the preferred embodiment of the present invention The network card system and error detection method will repeat the above steps of detection, transmission, storage, restarting, etc., until the parity detection module, system error detection module, and correction memory detection module are not detected After the above error information, the intelligent network card system with error detection function can operate normally.

藉由以上較佳具體實施例之詳述,係希望能更加清楚描述本發明之特徵與精神,而並非以上述所揭露的較佳具體實施例來對本發明之範疇加以限制。相反地,其目的是希望能涵蓋各種改變及具相等性的安排於本發明所欲申請之專利範圍的範疇內。Through the detailed description of the above preferred embodiments, it is hoped that the characteristics and spirit of the present invention can be described more clearly, and the scope of the present invention is not limited by the preferred embodiments disclosed above. On the contrary, the purpose is to cover various changes and equivalent arrangements within the scope of the patent for which the present invention is intended.

1:具有錯誤偵測功能的智能網卡系統 11:處理晶片 111:奇偶校驗偵測模組 112:系統錯誤偵測模組 113:修正記憶體偵測模組 114:第一通訊模組 12:複雜可程式化邏輯裝置 121:接收模組 122:錯誤儲存模組 1221:奇偶校驗儲存單元 1222:系統錯誤儲存單元 1223:修正記憶體儲存單元 123:第二通訊模組 2:主機系統 21:基板管理控制器1: Intelligent network card system with error detection function 11: Handling wafers 111: Parity detection module 112: System error detection module 113: Modify the memory detection module 114: The first communication module 12: Complex programmable logic device 121: receiving module 122: Error storage module 1221: Parity storage unit 1222: System error storage unit 1223: Modify the memory storage unit 123: The second communication module 2: Host system 21: baseboard management controller

第一圖係顯示本發明較佳實施例所提供之具有錯誤偵測功能的智能網卡系統之方塊圖;以及 第二圖係顯示本發明較佳實施例所提供之錯誤偵測方法之流程圖。 The first figure is a block diagram showing the smart network card system with error detection function provided by the preferred embodiment of the present invention; and The second figure is a flowchart of the error detection method provided by the preferred embodiment of the present invention.

1:具有錯誤偵測功能的智能網卡系統 1: Intelligent network card system with error detection function

11:處理晶片 11: Handling wafers

111:奇偶校驗偵測模組 111: Parity detection module

112:系統錯誤偵測模組 112: System error detection module

113:修正記憶體偵測模組 113: Modify the memory detection module

114:第一通訊模組 114: The first communication module

12:複雜可程式化邏輯裝置 12: Complex programmable logic device

121:接收模組 121: receiving module

122:錯誤儲存模組 122: Error storage module

1221:奇偶校驗儲存單元 1221: Parity storage unit

1222:系統錯誤儲存單元 1222: System error storage unit

1223:修正記憶體儲存單元 1223: Modify the memory storage unit

123:第二通訊模組 123: The second communication module

2:主機系統 2: Host system

21:基板管理控制器 21: baseboard management controller

Claims (5)

一種具有錯誤偵測功能的智能網卡系統,係外接於一主機系統,並包含: 一處理晶片,包含: 一奇偶校驗偵測模組,係用以偵測該處理晶片之一奇偶校驗錯誤資訊; 一系統錯誤偵測模組,係用以偵測該處理晶片之一系統錯誤資訊; 一修正記憶體偵測模組,係用以偵測該處理晶片之一修正記憶體錯誤資訊;以及 一第一通訊模組,係電性連接該奇偶校驗偵測模組、該系統錯誤偵測模組與該修正記憶體偵測模組,用以在偵測出該奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體偵測模組中之至少一者時,傳送出上述該奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體偵測模組中之至少一者;以及 一複雜可程式化邏輯裝置,係電性連接該處理晶片,包含: 一接收模組,係電性連接該第一通訊模組,用以接收上述該奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體偵測模組中之至少一者: 一錯誤儲存模組,係電性連接該接收模組,並包含: 一奇偶校驗儲存單元,係用以儲存該奇偶校驗錯誤資訊; 一系統錯誤儲存單元,係用以儲存該系統錯誤資訊;以及 一修正記憶體儲存單元,係用以儲存該修正記憶體錯誤資訊;以及 一第二通訊模組,係電性連接該錯誤儲存模組,用以將上述該奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體偵測模組中之至少一者傳送至該主機系統之一基板管理控制器; 其中,該基板管理控制器接收並儲存上述該奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體偵測模組中之至少一者後,係傳送一清除信號,使該智能網卡系統受操作地單獨重新啟動之後,該錯誤儲存模組係清除上述該奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體偵測模組中之至少一者,藉以使該智能網卡系統正常運作。 An intelligent network card system with error detection function is externally connected to a host system and includes: A processing wafer, including: A parity detection module for detecting parity error information of one of the processing chips; A system error detection module for detecting system error information of one of the processing chips; A modified memory detection module for detecting one of the processing chips to correct memory error information; and A first communication module is electrically connected to the parity detection module, the system error detection module, and the correction memory detection module for detecting the parity error information, When at least one of the system error information and the modified memory detection module is sent, at least one of the above-mentioned parity error information, the system error information and the modified memory detection module is sent; and A complex programmable logic device, which is electrically connected to the processing chip, includes: A receiving module is electrically connected to the first communication module for receiving at least one of the parity error information, the system error information, and the corrected memory detection module: An error storage module, which is electrically connected to the receiving module, and includes: A parity storage unit for storing the parity error information; A system error storage unit for storing the system error information; and A modified memory storage unit for storing the modified memory error information; and A second communication module is electrically connected to the error storage module for transmitting at least one of the parity error information, the system error information, and the corrected memory detection module to the host One of the system baseboard management controllers; Wherein, after the baseboard management controller receives and stores at least one of the parity error information, the system error information, and the corrected memory detection module, it transmits a clear signal to make the smart network card system receive After the operation is separately restarted, the error storage module clears at least one of the parity error information, the system error information, and the corrected memory detection module, so that the smart network card system operates normally. 如請求項1所述之具有錯誤偵測功能的智能網卡系統,其中,該第一通訊模組係利用一串列通用型輸入輸出(Serial General Purpose Input/Output;SGPIO)介面電性連接該接收模組。The smart network card system with error detection function according to claim 1, wherein the first communication module uses a serial general purpose input/output (Serial General Purpose Input/Output; SGPIO) interface to electrically connect the receiver Module. 如請求項1所述之具有錯誤偵測功能的智能網卡系統,其中,該第二通訊模組係一I2C模組。The smart network card system with error detection function according to claim 1, wherein the second communication module is an I2C module. 如請求項1所述之具有錯誤偵測功能的智能網卡系統,其中,該第二通訊模組係利用一電源管理匯流排(Power Management Bus;PMBUS)電性連接該主機系統之該基板管理控制器。The intelligent network card system with error detection function according to claim 1, wherein the second communication module uses a power management bus (Power Management Bus; PMBUS) to electrically connect the baseboard management control of the host system Device. 一種錯誤偵測方法,係利用如請求項1所述之具有錯誤偵測功能的智能網卡系統加以實施,並包含以下步驟: (a) 利用該奇偶校驗偵測模組、該系統錯誤偵測模組與該修正記憶體偵測模組偵測出上述該奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體錯誤資訊中之至少一者; (b) 利用該第一通訊模組傳送出上述奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體錯誤資訊中之至少一者; (c) 利用該接收模組接收上述奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體錯誤資訊中之至少一者; (d) 利用該錯誤儲存模組儲存上述奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體錯誤資訊中之至少一者; (e) 利用該第二通訊模組將上述奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體錯誤資訊中之至少一者傳送至該基板管理控制器; (f) 使該智能網卡系統單獨重新啟動; (g) 利用該錯誤儲存模組接收該清除信號,並清除上述該奇偶校驗錯誤資訊、該系統錯誤資訊與該修正記憶體偵測模組中之至少一者。 An error detection method is implemented using the smart network card system with error detection function as described in claim 1, and includes the following steps: (a) Use the parity detection module, the system error detection module, and the corrected memory detection module to detect the above-mentioned parity error information, the system error information, and the corrected memory error At least one of the information; (b) Using the first communication module to transmit at least one of the aforementioned parity error information, the system error information, and the corrected memory error information; (c) Using the receiving module to receive at least one of the parity error information, the system error information, and the corrected memory error information; (d) Use the error storage module to store at least one of the above-mentioned parity error information, the system error information, and the corrected memory error information; (e) Using the second communication module to transmit at least one of the parity error information, the system error information, and the corrected memory error information to the baseboard management controller; (f) Restart the smart network card system separately; (g) Using the error storage module to receive the clear signal, and clear at least one of the above-mentioned parity error information, the system error information, and the correction memory detection module.
TW110108947A 2021-03-12 2021-03-12 Smart network interface controller system and method of detecting error TWI738627B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW110108947A TWI738627B (en) 2021-03-12 2021-03-12 Smart network interface controller system and method of detecting error

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW110108947A TWI738627B (en) 2021-03-12 2021-03-12 Smart network interface controller system and method of detecting error

Publications (2)

Publication Number Publication Date
TWI738627B true TWI738627B (en) 2021-09-01
TW202236274A TW202236274A (en) 2022-09-16

Family

ID=78777920

Family Applications (1)

Application Number Title Priority Date Filing Date
TW110108947A TWI738627B (en) 2021-03-12 2021-03-12 Smart network interface controller system and method of detecting error

Country Status (1)

Country Link
TW (1) TWI738627B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10678721B1 (en) * 2017-02-02 2020-06-09 Amazon Technologies, Inc. Communication link testing
US20200371828A1 (en) * 2019-05-20 2020-11-26 Microsoft Technology Licensing, Llc Server Offload Card With SOC And FPGA
US20210026731A1 (en) * 2019-07-23 2021-01-28 Alibaba Group Holding Limited Method and system for enhancing throughput of big data analysis in a nand-based read source storage

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10678721B1 (en) * 2017-02-02 2020-06-09 Amazon Technologies, Inc. Communication link testing
US20200371828A1 (en) * 2019-05-20 2020-11-26 Microsoft Technology Licensing, Llc Server Offload Card With SOC And FPGA
US20210026731A1 (en) * 2019-07-23 2021-01-28 Alibaba Group Holding Limited Method and system for enhancing throughput of big data analysis in a nand-based read source storage

Also Published As

Publication number Publication date
TW202236274A (en) 2022-09-16

Similar Documents

Publication Publication Date Title
JP5142138B2 (en) Method and memory system for identifying faulty memory elements in a memory system
US8145868B2 (en) Method and system for providing frame start indication in a memory system having indeterminate read data latency
WO2019136595A1 (en) Method for handling i2c bus deadlock, electronic device, and communication system
US8201069B2 (en) Cyclical redundancy code for use in a high-speed serial link
CN104699576B (en) Serial communication testing device, system comprising same and method thereof
KR102399843B1 (en) Error correction hardware with fault detection
US7984357B2 (en) Implementing minimized latency and maximized reliability when data traverses multiple buses
US20080046802A1 (en) Memory controller and method of controlling memory
EP0930718A2 (en) Tandem operation of input/output data compression modules
US20210089396A1 (en) System and method for using a directory to recover a coherent system from an uncorrectable error
TW201306042A (en) Semiconductor memory apparatus and semiconductor system having the same
TW202105182A (en) Memory systems and writing methods of the memory systems
KR20110003726A (en) Crc mamagement method for sata interface and data storage device thereof
US8484546B2 (en) Information processing apparatus, information transmitting method, and information receiving method
TWI738627B (en) Smart network interface controller system and method of detecting error
US7328368B2 (en) Dynamic interconnect width reduction to improve interconnect availability
CN112927748A (en) Memory system, integrated circuit system, and method of operating memory system
US8291270B2 (en) Request processing device, request processing system, and access testing method
CN113037507B (en) Intelligent network card system with error detection function and error detection method
US10740179B2 (en) Memory and method for operating the memory
TWI764342B (en) Startup status detection system and method thereof
TWI757606B (en) Server device and communication method between baseboard management controller and programmable logic unit thereof
CN114518972B (en) Memory error processing method and device, memory controller and processor
TWI767378B (en) Error type determination system and method thereof
US11831337B2 (en) Semiconductor device and error detection methods