TW201928981A - System for testing whole memory and method thereof - Google Patents

System for testing whole memory and method thereof Download PDF

Info

Publication number
TW201928981A
TW201928981A TW106143817A TW106143817A TW201928981A TW 201928981 A TW201928981 A TW 201928981A TW 106143817 A TW106143817 A TW 106143817A TW 106143817 A TW106143817 A TW 106143817A TW 201928981 A TW201928981 A TW 201928981A
Authority
TW
Taiwan
Prior art keywords
memory
module
address range
test
computing device
Prior art date
Application number
TW106143817A
Other languages
Chinese (zh)
Other versions
TWI733964B (en
Inventor
李岩
Original Assignee
英業達股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 英業達股份有限公司 filed Critical 英業達股份有限公司
Priority to TW106143817A priority Critical patent/TWI733964B/en
Publication of TW201928981A publication Critical patent/TW201928981A/en
Application granted granted Critical
Publication of TWI733964B publication Critical patent/TWI733964B/en

Links

Landscapes

  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

A system for testing whole memory and a method thereof are provided. By transferring a physical address range of each of memory module to a virtual address range in a memory, executing a read and write test for the virtual address range, detecting an error message of the memory during the read and write test, and transferring a virtual address contained in the error message to a physical address of one of the memory module corresponding to the virtual address when the error message is detected, the system and the method can improve test coverage and effectiveness of memory test and determine problematic memory module accurately, and can achieve the effect of preventing operation system terminates memory test process during read and write test.

Description

記憶體整體測試之系統及其方法Memory overall test system and method thereof

一種測試系統及其方法,特別係指一種記憶體整體測試之系統及其方法。A test system and method thereof, in particular, a system and method for overall memory testing.

隨著半導體製程技術不斷提高,積體電路(IC)的設計規模越來越大,記憶體、處理器、南北橋等高度複雜的積體電路可能存在的缺陷類型越來越多。另一方面,隨著積體電路的複雜度的提高,積體電路產品,特別是記憶體,在伺服器、桌上型或筆記型電腦、行動裝置、以及消費性和商業電子的伺服器等計算裝置中被大量的使用。With the continuous improvement of semiconductor process technology, the design scale of integrated circuits (ICs) is getting larger and larger, and there are more and more types of defects that may exist in highly complex integrated circuits such as memory, processors, and north-south bridges. On the other hand, with the increasing complexity of integrated circuits, integrated circuit products, especially memory, in servers, desktop or notebook computers, mobile devices, and consumer and commercial electronic servers, etc. A large number of uses in computing devices.

在計算裝置運行的過程中,若發生記憶體錯誤,除了可能產生錯誤的結果之外,也可以造成應用程式關閉,甚至導致計算裝置當機的情況,尤其是伺服器等計算裝置的穩定性非常重要。由於記憶體的穩定性對於計算裝置的穩定性影像非常大,為了增加伺服器的穩定性,記憶體測試是一個必要的部分。In the process of running the computing device, if a memory error occurs, in addition to the possible erroneous results, the application may be shut down, or even the computing device may be down, especially the stability of the computing device such as the server. important. Since the stability of the memory is very large for the stability of the computing device, memory testing is an essential part in order to increase the stability of the server.

記憶體測試至少有單獨對記憶體模組進行測試,透過BIOS對安裝於計算裝置中的記憶體模組進行測試,以及在計算裝置的作業系統運行時測試所有記憶體。其中,在計算裝置的作業系統運行時測試記憶體的方式,目前各家廠商大多使用記憶體耐壓測試(RAM Stress Test, RST )程式來進行。雖然目前的記憶體耐壓測試程式可測試記憶體的穩定度,但目前的記憶體耐壓測試程式相當耗時,且在作業系統中進行測試無法完整的測試所有記憶體,此外,目前的記憶體耐壓測試程式在測試的過程中發現錯誤時,只能取得錯誤的記憶體位址,但無法反映出發生錯誤的記憶體模組。The memory test tests at least the memory module separately, tests the memory module installed in the computing device through the BIOS, and tests all the memory while the operating system of the computing device is running. Among them, the way to test the memory when the operating system of the computing device is running is currently performed by various manufacturers using the RAM Stress Test (RST) program. Although the current memory withstand voltage test program can test the stability of the memory, the current memory withstand voltage test program is quite time consuming, and testing in the operating system cannot completely test all the memory, in addition, the current memory When the body withstand voltage test program finds an error during the test, it can only obtain the wrong memory address, but it cannot reflect the memory module in which the error occurred.

綜上所述,可知先前技術中長期以來一直存在記憶體耐壓測試程式的測試覆蓋率低、有效性差、無法有效判斷錯誤記憶體模組的問題,因此有必要提出改進的技術手段,來解決此一問題。In summary, it has been known in the prior art that the memory withstand test program has a low test coverage, poor validity, and cannot effectively judge the error memory module. Therefore, it is necessary to propose an improved technical means to solve the problem. This question.

有鑒於先前技術存在記憶體耐壓測試程式可能被作業系統結束執行的問題,本發明遂揭露一種記憶體整體測試之系統及其方法,其中:In view of the prior art, there is a problem that a memory withstand voltage test program may be terminated by an operating system, and the present invention discloses a system for testing a memory as a whole and a method thereof, wherein:

本發明所揭露之記憶體整體測試之系統,應用於計算裝置所執行之作業系統中,且計算裝置包含做為記憶體的記憶體模組,本發明所揭露之系統至少包含:位址映射模組,用以建立映射資訊,映射資訊包含記憶體之邏輯位址範圍與記憶體模組之物理位址範圍的對應關係;資料存取模組,用以依據映射資訊轉換記憶體模組之物理位址範圍為記憶體之邏輯位址範圍,並依據記憶體之邏輯位址範圍對記憶體進行讀寫測試;錯誤偵測模組,用以偵測進行讀寫測試時所產生之與記憶體對應之錯誤訊息;錯誤報告模組,用以依據映射資訊將錯誤訊息中之邏輯位址轉換為相對應之記憶體模組之物理位址,並產生與邏輯位址對應之記憶體模組之模組資訊,及用以輸出物理位址及模組資訊。The system for testing the overall memory of the present invention is applied to an operating system executed by a computing device, and the computing device includes a memory module as a memory. The system disclosed by the present invention includes at least: an address mapping module. The group is used to establish mapping information, and the mapping information includes a correspondence between a logical address range of the memory and a physical address range of the memory module; and a data access module for converting the physicality of the memory module according to the mapping information The address range is the logical address range of the memory, and the memory is read and written according to the logical address range of the memory; the error detection module is used to detect the memory generated when the read/write test is performed. Corresponding error message; the error reporting module is configured to convert the logical address in the error message into a physical address of the corresponding memory module according to the mapping information, and generate a memory module corresponding to the logical address Module information, and used to output physical address and module information.

本發明所揭露之記憶體整體測試之方法,應用於計算裝置所執行之作業系統中,且計算裝置包含做為記憶體的記憶體模組,本發明所揭露之方法的步驟至少包括:建立映射資訊,映射資訊包含記憶體模組之物理位址範圍與記憶體之邏輯位址範圍的對應關係;依據映射資訊轉換記憶體模組之物理位址範圍為記憶體之邏輯位址範圍,並依據記憶體之邏輯位址範圍對記憶體進行讀寫測試;偵測進行讀寫測試時所產生之與記憶體對應之錯誤訊息;依據映射資訊將錯誤訊息中之邏輯位址轉換為相對應之記憶體模組之物理位址,並產生與邏輯位址對應之記憶體模組之模組資訊;輸出物理位址及模組資訊。The method for testing the overall memory of the present invention is applied to an operating system executed by a computing device, and the computing device includes a memory module as a memory. The steps of the method disclosed by the present invention include at least: establishing a mapping Information, the mapping information includes a correspondence between a physical address range of the memory module and a logical address range of the memory; the physical address range of the memory module according to the mapping information is a logical address range of the memory, and is based on The logical address range of the memory reads and writes the memory; detects an error message corresponding to the memory generated during the reading and writing test; converts the logical address in the error message into a corresponding memory according to the mapping information The physical address of the body module, and generates module information of the memory module corresponding to the logical address; output physical address and module information.

本發明所揭露之系統與方法如上,與先前技術之間的差異在於本發明透過轉換記憶體模組之物理位址範圍為記憶體之邏輯位址範圍後,依據記憶體之邏輯位址範圍對記憶體進行讀寫測試,並在進行讀寫測試的過程中偵測與記憶體對應之錯誤訊息時,將錯誤訊息中的邏輯位址轉換為相對應之記憶體模組的物理位址,藉以解決先前技術所存在的問題,並可以達成避免測試過程中被作業系統結束執行的技術功效。The system and method disclosed in the present invention are different from the prior art in that the physical address range of the conversion memory module is the logical address range of the memory, and the logical address range is based on the memory. The memory performs a read/write test, and when detecting an error message corresponding to the memory during the reading and writing test, the logical address in the error message is converted into a physical address of the corresponding memory module, thereby Solve the problems existing in the prior art, and can achieve the technical effect of avoiding the end of execution by the operating system during the test.

以下將配合圖式及實施例來詳細說明本發明之特徵與實施方式,內容足以使任何熟習相關技藝者能夠輕易地充分理解本發明解決技術問題所應用的技術手段並據以實施,藉此實現本發明可達成的功效。The features and embodiments of the present invention will be described in detail below with reference to the drawings and embodiments, which are sufficient to enable those skilled in the art to fully understand the technical means to which the present invention solves the technical problems, and The achievable effects of the present invention.

本發明可以在作業系統中執行,透過指定記憶體模組之物理地址範圍的方式將記憶體分為多個小單元進行讀寫測試,並在讀寫測試的過程中偵測與記憶體對應的錯誤訊息,藉以在偵測到錯誤訊息時依據錯誤訊息判斷發生讀寫錯誤的記憶體模組。The invention can be executed in the operating system, and the memory is divided into a plurality of small units for reading and writing test by specifying a physical address range of the memory module, and the memory corresponding to the memory is detected during the reading and writing test. The error message is used to determine the memory module in which the read/write error occurred based on the error message when the error message is detected.

以下先以「第1圖」本發明所提之記憶體整體測試之系統架構圖來說明本發明的系統運作。如「第1圖」所示,本發明之系統含有位址映射模組111、資料存取模組113、錯誤偵測模組115、以及錯誤報告模組117。Hereinafter, the system operation of the present invention will be described with reference to the system architecture diagram of the memory overall test of the present invention in "Fig. 1". As shown in FIG. 1, the system of the present invention includes an address mapping module 111, a data access module 113, an error detection module 115, and an error reporting module 117.

位址映射模組111負責建立映射資訊。位址映射模組111所建立的映射資訊包含記憶體101之邏輯位址範圍與安裝於計算裝置100中的記憶體模組之物理位址範圍之間的對應關係。例如,計算裝置100中安裝有兩個記憶體模組,分別是4G與8G,則記憶體101的邏輯位址範圍是0x0至0xBFFFFFFFFF,第一個記憶體模組的物理位址範圍0x0至0xBFFFFFFFFF與記憶體101的邏輯位址範圍0x0至0x3FFFFFFFFF對應,第二個記憶體模組的物理位址範圍0x0至0x7FFFFFFFFF與記憶體101的邏輯位址範圍0xC000000000至0xBFFFFFFFFF對應。The address mapping module 111 is responsible for establishing mapping information. The mapping information established by the address mapping module 111 includes a correspondence between a logical address range of the memory 101 and a physical address range of the memory module installed in the computing device 100. For example, in the computing device 100, two memory modules are installed, which are 4G and 8G respectively, and the logical address range of the memory 101 is 0x0 to 0xBFFFFFFFFF, and the physical address range of the first memory module is 0x0 to 0xBFFFFFFFFF. Corresponding to the logical address range 0x0 to 0x3FFFFFFFFF of the memory 101, the physical address range 0x0 to 0x7FFFFFFFFF of the second memory module corresponds to the logical address range 0xC000000000 to 0xBFFFFFFFFF of the memory 101.

一般而言,位址映射模組111可以依據記憶體模組上之串列存在探測(Serial Presence Detect, SPD)晶片中所記錄的資料、計算裝置100或作業系統110所記錄的系統設定、或是使用者的設定取得安裝於計算裝置100中之記憶體模組的容量,另外,位址映射模組111也可以取得記憶體模組被計算裝置100初始化的順序(通常是記憶體模組被安裝之記憶體插槽的插槽編號等,但本發明並不以此為限),並依據所取得之記憶體模組的容量與順序判斷各個記憶體模組的物理位址範圍,藉以建立映射資訊。In general, the address mapping module 111 can be based on the data recorded in the Serial Presence Detect (SPD) chip on the memory module, the system settings recorded by the computing device 100 or the operating system 110, or The user's setting obtains the capacity of the memory module installed in the computing device 100. Alternatively, the address mapping module 111 may obtain the order in which the memory module is initialized by the computing device 100 (usually the memory module is The slot number of the installed memory slot, etc., but the invention is not limited thereto, and the physical address range of each memory module is determined according to the capacity and order of the obtained memory module, thereby establishing Map information.

資料存取模組113負責對記憶體101進行讀寫測試。資料存取模組113可以先取得欲進行讀寫測試之記憶體模組的物理位址範圍,再依據位址映射模組111所建立的映射資訊將所取得之記憶體模組的物理位址範圍轉換為記憶體101的邏輯位址範圍,並對轉換出之記憶體101的邏輯位址範圍對記憶體101進行讀寫測試。其中,資料存取模組113進行讀寫測試之記憶體101的邏輯位址範圍在本發明中被稱為「測試位址範圍」。The data access module 113 is responsible for performing read and write tests on the memory 101. The data access module 113 can first obtain the physical address range of the memory module to be read and written, and then obtain the physical address of the obtained memory module according to the mapping information established by the address mapping module 111. The range is converted to the logical address range of the memory 101, and the memory 101 is subjected to a read/write test on the logical address range of the converted memory 101. The logical address range of the memory 101 in which the data access module 113 performs the read/write test is referred to as a "test address range" in the present invention.

為了增加讀寫測試的效率,資料存取模組113也可以將每一個記憶體模組的物理位址範圍分割為多個位址範圍較小的小單元,並依據位址映射模組111所建立的映射資訊將分割出來之每一個小單元的位址範圍轉換為記憶體101中相對應的測試位址範圍,同時,資料存取模組113也可以產生與計算裝置100之處理核心數相同數量的多個測試執行序,並可以將所有的測試位址範圍分配給各個測試執行序(thread),使得每一個測試執行序都可以被分配到一個或多個小單元,藉以對所分配到之小單元所對應的測試位址範圍進行讀寫測試。每一個測試執行序可以使用Match Pattern的方式對小單元所對應的位址範圍進行讀寫測試,例如:↑(W0)、↑(R0,W1)、↓(R1,W0)、↓(R0,W1)、↑(R1,W0)、↓(R0,W1)、↓(R1,W0)、↑(R0,W1)、↑R1,其中,「W」表示寫入資料,「R」表示讀出資料並比對,「↑」表示升序,「↓」表示降序,「0」表示0向量,「1」表示1向量,也就是說,「↑(W0)」表示升序寫入0向量,「↑(R0,W1)」表示升序讀取資料並比對資料是否與0向量相同後再升序寫入1向量,依此類推。另外,需要說明的是,0向量與1向量共有五組,第一组為「0000000000000000」與「FFFFFFFFFFFFFFFF」,第二组為「0F0F0F0F0F0F0F0F」與「F0F0F0F0F0F0F0F0」,第三组為「5555555555555555」與「AAAAAAAAAAAAAAAA」,第四组為「3333333333333333」與「CCCCCCCCCCCCCCCC」,第五组為「7777777777777777」與「8888888888888888」,測試執行序可以依序使用五組0向量與1向量對同一個小單元所對應的位址範圍進行讀寫測試。In order to increase the efficiency of the literacy test, the data access module 113 may also divide the physical address range of each memory module into a plurality of small units with a small address range, and according to the address mapping module 111. The established mapping information converts the address range of each of the divided small cells into a corresponding test address range in the memory 101, and the data access module 113 can also generate the same number of processing cores as the computing device 100. A number of test execution sequences, and all test address ranges can be assigned to individual test execution threads so that each test execution sequence can be assigned to one or more small units, thereby assigning The test address range corresponding to the small unit is tested for reading and writing. Each test execution sequence can use the Match Pattern method to read and write the address range corresponding to the small unit, for example: ↑ (W0), ↑ (R0, W1), ↓ (R1, W0), ↓ (R0, W1), ↑ (R1, W0), ↓ (R0, W1), ↓ (R1, W0), ↑ (R0, W1), ↑ R1, where "W" indicates writing data, and "R" indicates reading The data is compared. "↑" means ascending order, "↓" means descending order, "0" means 0 vector, "1" means 1 vector, that is, "↑(W0)" means writing 0 vector in ascending order, "↑ (R0, W1)" means reading the data in ascending order and comparing whether the data is the same as the 0 vector, then writing the 1 vector in ascending order, and so on. In addition, it should be noted that there are five groups of 0 vector and 1 vector, the first group is "0000000000000000" and "FFFFFFFFFFFFFFFF", the second group is "0F0F0F0F0F0F0F0F" and "F0F0F0F0F0F0F0F0", and the third group is "5555555555555555" and "AAAAAAAAAAAAAAAA". The fourth group is "3333333333333333" and "CCCCCCCCCCCCCCCC", the fifth group is "7777777777777777" and "8888888888888888". The test execution sequence can use five groups of 0 vectors and 1 vector pair corresponding addresses of the same small unit. The range is read and written.

一般而言,資料存取模組113可以產生使用者程序以及核心程序。其中,使用者程序可以存取記憶體101的特定邏輯位址,例如,上述的測試執行序;核心程序則可以配置進行讀寫測試之記憶體模組的物理位址範圍,並依據位址映射模組111所建立的映射資訊將所配置之記憶體模組的物理位址範圍轉換為對應的邏輯位址範圍(也就是測試位址範圍),藉以提供使用者程序依據邏輯位置範圍進行讀寫測試,另外,核心程序也可以在使用者程序完成讀寫測試後,釋放(free)使用者程序進行讀寫測試的邏輯位址範圍,並釋放與使用者程序進行讀寫測試之邏輯位址範圍對應的物理位址範圍。In general, the data access module 113 can generate user programs as well as core programs. The user program can access a specific logical address of the memory 101, for example, the test execution sequence; the core program can configure the physical address range of the memory module for reading and writing test, and according to the address mapping The mapping information established by the module 111 converts the physical address range of the configured memory module into a corresponding logical address range (that is, a test address range), thereby providing the user program to read and write according to the logical location range. Test, in addition, the core program can also release the logical address range of the user program for reading and writing tests after the user program completes the read and write test, and release the logical address range for reading and writing tests with the user program. The corresponding physical address range.

在部分的實施例中,資料存取模組113也可以關閉計算裝置100的記憶體快取(cache)機制,藉以讓讀寫測試可以確實地在記憶體101中進行。In some embodiments, the data access module 113 can also disable the memory cache mechanism of the computing device 100 so that the read and write test can be performed in the memory 101.

錯誤偵測模組115負責在資料存取模組113進行讀寫測試的同時,偵測是否有與記憶體101對應的錯誤訊息被產生。The error detection module 115 is responsible for detecting whether an error message corresponding to the memory 101 is generated while the data access module 113 performs a read/write test.

一般而言,錯誤偵測模組115可以監控記憶體101的錯誤檢查與糾正計數器(Error Checking & Correcting counter, ECC counter)、監控計算裝置100之基板管理控制器(Baseboard Management Controller, BMC)的事件記錄(System Event Log, SEL)、檢查作業系統110所產生的開機訊息、及/或監控作業系統110所產生的硬體診斷記錄,藉以偵測是否有與記憶體101對應的錯誤訊息被產生。In general, the error detection module 115 can monitor the error check and correction counter (ECC counter) of the memory 101 and monitor the event of the baseboard management controller (BMC) of the computing device 100. A system event log (SEL), a boot message generated by the operating system 110, and/or a hardware diagnostic record generated by the monitoring operating system 110 are used to detect whether an error message corresponding to the memory 101 is generated.

錯誤報告模組117負責依據位址映射模組111所建立的映射資訊,將錯誤偵測模組115所偵測到之錯誤訊息中的邏輯位址轉換為相對應之記憶體模組的物理位址。The error reporting module 117 is responsible for converting the logical address in the error message detected by the error detection module 115 to the physical location of the corresponding memory module according to the mapping information established by the address mapping module 111. site.

錯誤報告模組117也負責產生與錯誤偵測模組115所偵測到之錯誤訊息中的邏輯位址對應之記憶體模組的模組資訊。其中,錯誤報告模組117所產生之模組資訊為方便確認記憶題模組的資料,例如,模組資訊可以包含記憶體模組的廠商訊息、型號、序號等可以由記憶體模組之串列存在探測晶片中取得的資料,模組資訊也可以包含記憶體模組所安裝之記憶體插槽的插槽編號及/或插槽位置。The error reporting module 117 is also responsible for generating module information of the memory module corresponding to the logical address in the error message detected by the error detection module 115. The module information generated by the error reporting module 117 is convenient for confirming the data of the memory module. For example, the module information may include the manufacturer's message, model number, serial number, etc. of the memory module. The data stored in the probe chip is stored in the column, and the module information may also include the slot number and/or slot position of the memory slot installed in the memory module.

錯誤報告模組117也負責輸出轉換出之物理位址以及所產生的模組資訊。The error reporting module 117 is also responsible for outputting the converted physical address and the generated module information.

接著以一個實施例來解說本發明的運作系統與方法,並請參照「第2圖」本發明所提之記憶體整體測試之方法流程圖。在本實施例中,假設計算裝置100為伺服器,且計算裝置100中安裝有4條16G的記憶體模組。Next, an operational system and method of the present invention will be described with reference to an embodiment, and reference is made to the flowchart of the method for testing the overall memory of the present invention as described in the "Fig. 2". In this embodiment, it is assumed that the computing device 100 is a server, and four 16G memory modules are installed in the computing device 100.

在計算裝置100的作業系統110完成開機後,使用者可以在作業系統110中執行包含本發明的應用程式,如此,位址映射模組111可以建立各個記憶體模組之物理位址範圍與記憶體101之邏輯位址範圍的對應關係(步驟210)。After the operating system 110 of the computing device 100 is powered on, the user can execute the application including the present invention in the operating system 110. Thus, the address mapping module 111 can establish the physical address range and memory of each memory module. The correspondence of the logical address ranges of the body 101 (step 210).

之後,資料存取模組113可以依據位址映射模組111所建立的映射資訊將各個記憶體模組的物理位址範圍分別轉換為記憶體101的邏輯位址範圍,並依據轉換後之記憶體101的邏輯位址範圍對對記憶體101進行讀寫測試(步驟220)。在本實施例中,假設資料存取模組113可以呼叫核心程序,核心程序在執行後,可以將每一個記憶體模組的物理位址範圍分割為多個小單元,也就是將記憶體101分割為多個小單元,並依據位址映射模組111所建立的映射資訊將各個小單元的位址範圍轉換為對應的邏輯位址範圍(也就是測試位址範圍),接著,資料存取模組113可以呼叫使用者程序,使用者程序在執行後,可以產生與計算裝置100之處理核心數相同數量的多個測試執行序,並可以將各個小單元平均分配給各個測試執行序,使得各個測試執行序分別對所分配到之小單元所對應的測試位址範圍進行讀寫測試。在每一個測試執行序完成一個小單元的讀寫測試後,核心程序可以先釋放(free)完成讀寫測試之小單元所對應的測試位址範圍(邏輯位址範圍),並釋放與完成讀寫測試之小單元的位址範圍(物理位址範圍)。After that, the data access module 113 can convert the physical address ranges of the memory modules into the logical address ranges of the memory 101 according to the mapping information established by the address mapping module 111, and according to the converted memory. The logical address range of the body 101 is subjected to a read and write test on the memory 101 (step 220). In this embodiment, it is assumed that the data access module 113 can call the core program. After the core program is executed, the physical address range of each memory module can be divided into a plurality of small units, that is, the memory 101. Dividing into a plurality of small units, and converting the address range of each small unit into a corresponding logical address range (that is, a test address range) according to the mapping information established by the address mapping module 111, and then accessing the data The module 113 can call the user program. After the user program is executed, the same number of test execution sequences as the number of processing cores of the computing device 100 can be generated, and each small unit can be evenly distributed to each test execution sequence, so that Each test execution sequence performs read and write tests on the test address range corresponding to the assigned small unit. After each test execution sequence completes a small unit read/write test, the core program can first release (free) the test address range (logical address range) corresponding to the small unit that completes the read/write test, and release and complete the read. Write the address range (physical address range) of the test cell.

在資料存取模組113對記憶體101進行讀寫測試的同時,錯誤偵測模組115可以持續的偵測是否有與記憶體101對應的錯誤訊息被產生(步驟230),直到資料存取模組113對記憶體101完成讀寫測試為止。在本實施例中,假設錯誤偵測模組115可以監控記憶體101的錯誤檢查與糾正計數器、監控計算裝置100之基板管理控制器的事件記錄、使用dmesg指令檢查作業系統110所產生的開機訊息、及/或監控作業系統110所產生的硬體診斷記錄檔(如mcelog)。While the data access module 113 performs a read/write test on the memory 101, the error detection module 115 can continuously detect whether an error message corresponding to the memory 101 is generated (step 230) until the data access is performed. The module 113 completes the read and write test on the memory 101. In this embodiment, it is assumed that the error detection module 115 can monitor the error check and correction counter of the memory 101, monitor the event record of the substrate management controller of the computing device 100, and check the boot message generated by the operating system 110 using the dmesg command. And/or monitoring a hardware diagnostic log file (eg, mcelog) generated by the operating system 110.

若錯誤偵測模組115在資料存取模組113對記憶體101進行讀寫測試的過程中都沒有偵測到任何與記憶體101對應的錯誤訊息,則錯誤報告模組117將可以不執行。而若錯誤偵測模組115在資料存取模組113對記憶體101進行讀寫測試的過程中偵測到與記憶體101對應的錯誤訊息,則錯誤報告模組117可以依據位址映射模組111所建立的映射資訊,將所偵測到之錯誤訊息中的邏輯位址轉換為其中一個記憶體模組的物理位址,並可以產生與錯誤訊息中之邏輯位址對應的記憶體模組的模組資訊(步驟250),以及可以輸出所產生之模組資訊以及轉換出之記憶體模組的物理位址(步驟260)。If the error detection module 115 does not detect any error message corresponding to the memory 101 during the data access module 113 performing the read/write test on the memory 101, the error report module 117 may not execute. . If the error detection module 115 detects an error message corresponding to the memory 101 during the data read and write test of the memory 101 by the data access module 113, the error report module 117 can map according to the address. The mapping information established by the group 111 converts the logical address in the detected error message into the physical address of one of the memory modules, and can generate a memory model corresponding to the logical address in the error message. The module information of the group (step 250), and the generated module information and the physical address of the converted memory module can be output (step 260).

如此,透過本發明,可以逐一依據記憶體模組中的物理位址範圍進行讀寫測試,藉以完整測試除作業系統110占用的所有記憶體101,避免現有記憶體測試程序在測試時占用大量記憶體導致作業系統110在讀寫測試的過程中判斷記憶體不足而刪除記憶體測試程序的問題。Thus, through the present invention, the read/write test can be performed one by one according to the physical address range in the memory module, so that all the memory 101 occupied by the operating system 110 can be completely tested, and the existing memory test program can be used to avoid a large amount of memory during the test. The body causes the operating system 110 to determine the problem of insufficient memory and delete the memory test program during the read and write test.

上述的實施例中,在資料存取模組113對記憶體101進行讀寫測試(步驟220)時,更詳細的說,在測試執行序對所分配到的測試位址範圍進行讀寫測試前,核心程序可以先關閉計算裝置100的記憶體快取機制,並可以在測試執行序完成讀寫測試後開啟計算裝置100的記憶體快取機制。In the above embodiment, when the data access module 113 performs a read/write test on the memory 101 (step 220), in more detail, before the test execution sequence performs the read/write test on the assigned test address range. The core program may first turn off the memory cache mechanism of the computing device 100, and may turn on the memory cache mechanism of the computing device 100 after the test execution sequence completes the read and write test.

綜上所述,可知本發明與先前技術之間的差異在於具有轉換記憶體模組之物理位址範圍為記憶體之邏輯位址範圍後,依據記憶體之邏輯位址範圍對記憶體進行讀寫測試,並在進行讀寫測試的過程中偵測與記憶體對應之錯誤訊息時,將錯誤訊息中的邏輯位址轉換為相對應之記憶體模組的物理位址之技術手段,藉由此一技術手段可以解決先前技術所存在記憶體耐壓測試程式的測試覆蓋率低、有效性差、無法有效判斷錯誤記憶體模組的問題,進而達成避免測試過程中被作業系統結束執行的技術功效。In summary, it can be seen that the difference between the present invention and the prior art is that after the physical address range of the conversion memory module is the logical address range of the memory, the memory is read according to the logical address range of the memory. Write a test and convert the logical address in the error message to the physical address of the corresponding memory module when detecting the error message corresponding to the memory during the read and write test. The technical means can solve the problem that the memory withstand voltage test program of the prior art has low test coverage, poor validity, and can not effectively judge the wrong memory module, thereby achieving the technical effect of avoiding the end of execution by the operating system during the test process. .

再者,本發明之記憶體整體測試之方法,可實現於硬體、軟體或硬體與軟體之組合中,亦可在電腦系統中以集中方式實現或以不同元件散佈於若干互連之電腦系統的分散方式實現。Furthermore, the method for testing the memory of the present invention can be implemented in a combination of hardware, software or a combination of hardware and software, or can be implemented in a centralized manner in a computer system or distributed in a plurality of interconnected computers with different components. The decentralized way of the system is implemented.

雖然本發明所揭露之實施方式如上,惟所述之內容並非用以直接限定本發明之專利保護範圍。任何本發明所屬技術領域中具有通常知識者,在不脫離本發明所揭露之精神和範圍的前提下,對本發明之實施的形式上及細節上作些許之更動潤飾,均屬於本發明之專利保護範圍。本發明之專利保護範圍,仍須以所附之申請專利範圍所界定者為準。While the embodiments of the present invention have been described above, the above description is not intended to limit the scope of the invention. Any modification of the form and details of the practice of the present invention, which is a matter of ordinary skill in the art to which the present invention pertains, is a patent protection of the present invention. range. The scope of the invention is to be determined by the scope of the appended claims.

100‧‧‧計算裝置100‧‧‧ computing device

101‧‧‧記憶體101‧‧‧ memory

110‧‧‧作業系統110‧‧‧Operating system

111‧‧‧位址映射模組111‧‧‧ address mapping module

113‧‧‧資料存取模組113‧‧‧ Data Access Module

115‧‧‧錯誤偵測模組115‧‧‧Error Detection Module

117‧‧‧錯誤報告模組117‧‧‧Error reporting module

步驟210‧‧‧建立映射資訊,映射資訊包含記憶體模組之物理位址範圍與記憶體之邏輯位址範圍的對應關係Step 210‧‧‧Create mapping information, the mapping information includes the correspondence between the physical address range of the memory module and the logical address range of the memory

步驟220‧‧‧依據映射資訊轉換記憶體模組之物理位址範圍為記憶體之邏輯位址範圍,並依據記憶體之邏輯位址範圍對記憶體進行讀寫測試Step 220‧‧‧ Converting the physical address range of the memory module according to the mapping information into a logical address range of the memory, and reading and writing the memory according to the logical address range of the memory

步驟230‧‧‧偵測是否產生與記憶體對應之錯誤訊息Step 230‧‧‧Detect whether an error message corresponding to the memory is generated

步驟250‧‧‧依據映射資訊將錯誤訊息中之邏輯位址轉換為記憶體模組之物理位址,並產生與邏輯位址對應之記憶體模組之模組資訊Step 250‧‧‧ Convert the logical address in the error message into the physical address of the memory module according to the mapping information, and generate module information of the memory module corresponding to the logical address

步驟260‧‧‧輸出物理位址及模組資訊Step 260‧‧‧ Output physical address and module information

第1圖為本發明所提之記憶體整體測試之系統架構圖。 第2圖為本發明所提之記憶體整體測試之方法流程圖。Figure 1 is a system architecture diagram of the overall memory test of the present invention. Figure 2 is a flow chart of the method for testing the overall memory of the present invention.

Claims (10)

一種記憶體整體測試之系統,係應用於一計算裝置所執行之一作業系統中,該計算裝置包含至少一記憶體模組,該些記憶體模組做為該計算裝置之一記憶體,該系統至少包含: 一位址映射模組,用以建立一映射資訊,該映射資訊包含該記憶體之邏輯位址範圍與各該記憶體模組之物理位址範圍的對應關係; 一資料存取模組,用以將各該記憶體模組之物理位址範圍分割為多個小單元,並依據該映射資訊轉換各該小單元之位址範圍為對應之測試位址範圍,及用以產生與該計算裝置之處理核心數相同數量之多個測試執行序,並分配每一該測試執行序對至少一該小單元所對應之測試位址範圍進行讀寫測試。 一錯誤偵測模組,用以偵測進行該讀寫測試時所產生之與該記憶體對應之一錯誤訊息;及 一錯誤報告模組,用以依據該映射資訊將該錯誤訊息中之一邏輯位址轉換為相對應之一該記憶體模組之一物理位址,並產生與該邏輯位址對應之記憶體模組之一模組資訊,及用以輸出該物理位址及該模組資訊。A system for testing a memory as a whole is applied to an operating system executed by a computing device, the computing device comprising at least one memory module, wherein the memory modules are used as a memory of the computing device, The system includes at least: a bitmap mapping module, configured to establish a mapping information, where the mapping information includes a correspondence between a logical address range of the memory and a physical address range of each of the memory modules; The module is configured to divide the physical address range of each memory module into a plurality of small units, and convert the address range of each small unit to a corresponding test address range according to the mapping information, and generate And a plurality of test execution sequences of the same number of processing cores of the computing device, and each test execution sequence is assigned to perform read and write tests on at least one test address range corresponding to the small unit. An error detection module for detecting an error message corresponding to the memory generated when the read/write test is performed; and an error reporting module for using one of the error messages according to the mapping information Converting the logical address to one of the physical addresses of the memory module, and generating module information of the memory module corresponding to the logical address, and outputting the physical address and the mode Group information. 如申請專利範圍第1項所述之記憶體整體測試之系統,其中該錯誤偵測模組是監控該記憶體的錯誤檢查與糾正計數器(Error Checking & Correcting counter, ECC counter)、監控該計算裝置之基板管理控制器(Baseboard Management Controller, BMC)的事件記錄(System Event Log, SEL)、檢查該作業系統的開機訊息、及/或監控該作業系統的硬體診斷記錄,藉以偵測該錯誤訊息。The system for testing the overall memory as described in claim 1, wherein the error detection module is an error checking & correcting counter (ECC counter) for monitoring the memory, and monitoring the computing device The Baseboard Management Controller (BMC) event record (System), checking the operating system's boot message, and/or monitoring the operating system's hardware diagnostic record to detect the error message . 如申請專利範圍第1項所述之記憶體整體測試之系統,其中該資料存取模組更用以關閉該計算裝置之記憶體快取(cache)機制。The system for testing the overall memory as described in claim 1, wherein the data access module is further configured to close a memory cache mechanism of the computing device. 如申請專利範圍第1項所述之記憶體整體測試之系統,其中該位址映射模組是依據各該記憶體模組上之串列存在探測(Serial Presence Detect, SPD)晶片中所記錄的資料、該計算設備或該作業系統所記錄的設定、或是使用者設定取得各該記憶體模組之物理位址範圍。The system for testing the overall memory as described in claim 1, wherein the address mapping module is recorded according to a Serial Presence Detect (SPD) chip on each of the memory modules. The data, the setting recorded by the computing device or the operating system, or the user setting obtains a physical address range of each of the memory modules. 如申請專利範圍第1項所述之記憶體整體測試之系統,其中該模組資訊包含廠商訊息、型號、序號、及/或插槽位置。The system for testing the overall memory as described in claim 1, wherein the module information includes a manufacturer message, a model number, a serial number, and/or a slot location. 一種記憶體整體測試之方法,係應用於一計算裝置所執行之一作業系統中,該計算裝置包含至少一記憶體模組,該些記憶體模組做為該計算裝置之一記憶體,該方法至少包含下列步驟: 建立一映射資訊,該映射資訊包含各該記憶體模組之物理位址範圍與該記憶體之邏輯位址範圍的對應關係; 依據該映射資訊轉換各該記憶體模組之物理位址範圍為該記憶體之邏輯位址範圍,並依據該記憶體之邏輯位址範圍對該記憶體進行一讀寫測試; 偵測進行該讀寫測試時所產生之與該記憶體對應之一錯誤訊息; 依據該映射資訊將該錯誤訊息中之一邏輯位址轉換為相對應之一該記憶體模組之一物理位址,並產生與該邏輯位址對應之記憶體模組之一模組資訊;及 輸出該物理位址及該模組資訊。A method for testing a memory as a whole is applied to an operating system executed by a computing device, the computing device comprising at least one memory module, wherein the memory modules are used as a memory of the computing device, The method includes the following steps: establishing a mapping information, where the mapping information includes a correspondence between a physical address range of each memory module and a logical address range of the memory; and converting each memory module according to the mapping information The physical address range is a logical address range of the memory, and the memory is subjected to a read/write test according to the logical address range of the memory; detecting the memory generated by the read/write test and the memory Corresponding to one of the error messages; converting one of the logical addresses of the error message to one of the physical addresses of the memory module according to the mapping information, and generating a memory module corresponding to the logical address One module information; and output the physical address and the module information. 如申請專利範圍第6項所述之記憶體整體測試之方法,其中依據該映射資訊轉換各該記憶體模組之物理位址範圍為該記憶體之邏輯位址範圍,並依據該記憶體之邏輯位址範圍對該記憶體進行該讀寫測試之步驟為分割各該記憶體模組之物理位址範圍為多個小單元,依據該映射資訊轉換各該小單元之位址範圍為對應之測試位址範圍,產生與該計算裝置之處理核心數相同數量之多個測試執行序,並分配每一該測試執行序對至少一該小單元所對應之測試位址範圍進行讀寫測試。The method for testing the memory as described in claim 6 , wherein the physical address range of each memory module is converted into a logical address range of the memory according to the mapping information, and according to the memory The logical address range performs the read/write test on the memory. The physical address range of each memory module is divided into a plurality of small units, and the address range of each small unit is converted according to the mapping information. The test address range is generated, and a plurality of test execution sequences are generated in the same number as the processing core of the computing device, and each test execution sequence is assigned to perform a read/write test on at least one test address range corresponding to the small unit. 如申請專利範圍第6項所述之記憶體整體測試之方法,其中偵測進行該讀寫測試時所產生之與該記憶體對應之該錯誤訊息之步驟為監控該記憶體的錯誤檢查與糾正計數器、監控該計算裝置之基板管理控制器的事件記錄、檢查該作業系統的開機訊息、及/或監控該作業系統的硬體診斷記錄。The method for testing the overall memory according to claim 6, wherein the step of detecting the error message corresponding to the memory generated during the reading and writing test is to monitor the error check and correction of the memory. A counter, an event record monitoring the baseboard management controller of the computing device, checking a boot message of the operating system, and/or monitoring a hardware diagnostic record of the operating system. 如申請專利範圍第6項所述之記憶體整體測試之方法,其中該方法於對該記憶體進行該讀寫測試之步驟中,更包含關閉該計算裝置之記憶體快取機制之步驟。The method of testing the memory as described in claim 6 , wherein the method further comprises the step of turning off the memory cache mechanism of the computing device in the step of performing the read/write test on the memory. 如申請專利範圍第6項所述之記憶體整體測試之方法,其中建立該映射資訊之步驟,更包含依據各該記憶體模組上之串列存在探測晶片中所記錄的資料、該計算設備或該作業系統所記錄的設定、或是使用者設定取得各該記憶體模組之物理位址範圍之步驟。The method for testing the overall memory as described in claim 6 , wherein the step of establishing the mapping information further comprises: storing the data recorded in the detecting chip according to the serial presence on each of the memory modules, the computing device Or the setting recorded by the operating system or the step of the user setting to obtain the physical address range of each of the memory modules.
TW106143817A 2017-12-13 2017-12-13 System for testing whole memory and method thereof TWI733964B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW106143817A TWI733964B (en) 2017-12-13 2017-12-13 System for testing whole memory and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW106143817A TWI733964B (en) 2017-12-13 2017-12-13 System for testing whole memory and method thereof

Publications (2)

Publication Number Publication Date
TW201928981A true TW201928981A (en) 2019-07-16
TWI733964B TWI733964B (en) 2021-07-21

Family

ID=68048680

Family Applications (1)

Application Number Title Priority Date Filing Date
TW106143817A TWI733964B (en) 2017-12-13 2017-12-13 System for testing whole memory and method thereof

Country Status (1)

Country Link
TW (1) TWI733964B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI760911B (en) * 2020-11-02 2022-04-11 英業達股份有限公司 Method for choosing and loading serial presence detecet

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5446876A (en) * 1994-04-15 1995-08-29 International Business Machines Corporation Hardware mechanism for instruction/data address tracing
JP4249285B2 (en) * 1998-03-25 2009-04-02 株式会社アドバンテスト Physical conversion definition editing device
US9645896B2 (en) * 2013-12-26 2017-05-09 Silicon Motion, Inc. Data storage device and flash memory control method
US9653184B2 (en) * 2014-06-16 2017-05-16 Sandisk Technologies Llc Non-volatile memory module with physical-to-physical address remapping
KR102625637B1 (en) * 2016-02-01 2024-01-17 에스케이하이닉스 주식회사 Data storage device and operating method thereof

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI760911B (en) * 2020-11-02 2022-04-11 英業達股份有限公司 Method for choosing and loading serial presence detecet

Also Published As

Publication number Publication date
TWI733964B (en) 2021-07-21

Similar Documents

Publication Publication Date Title
US10614905B2 (en) System for testing memory and method thereof
US10204698B2 (en) Method to dynamically inject errors in a repairable memory on silicon and a method to validate built-in-self-repair logic
US8862953B2 (en) Memory testing with selective use of an error correction code decoder
US20090125788A1 (en) Hardware based memory scrubbing
US9984766B1 (en) Memory protection circuitry testing and memory scrubbing using memory built-in self-test
CN113366576A (en) Retention self-test for power loss operations on memory systems
WO2019184612A1 (en) Terminal and electronic device
US9009548B2 (en) Memory testing of three dimensional (3D) stacked memory
JP2005135407A (en) System and method for testing component of computer system by using voltage margining
TWI515445B (en) Cutter in diagnosis (cid)-a method to improve the throughput of the yield ramp up process
WO2021238276A1 (en) Electric leakage detection method and apparatus for cpld
TWI733964B (en) System for testing whole memory and method thereof
CN117170948A (en) Positioning method and device for main board memory fault, electronic equipment and storage medium
JP2005149501A (en) System and method for testing memory with expansion card using dma
Querbach et al. A reusable BIST with software assisted repair technology for improved memory and IO debug, validation and test time
JP2013037631A (en) Diagnosis device, diagnosis method and diagnostic program diagnosis method
JP2012008620A (en) Error correction test method
JP2005149503A (en) System and method for testing memory using dma
JP2019160116A (en) Information processing device, test control method, and test control program
US20240219462A1 (en) Techniques for debug, survivability, and infield testing of a system-on-a-chip or a system-on-a-package
US20230084463A1 (en) Runtime non-destructive memory built-in self-test (bist)
WO2016138814A1 (en) Method and device for testing synchronous dynamic random access memory (sdram)
KR102483739B1 (en) Dram-based post-silicon debugging method and apparatus reusing bira cam structure
JP4749812B2 (en) Test equipment
EP3913634A1 (en) Memory testing by reading and verifying again memory locations after read access