TW201115474A - Memory detection method under the non uniform memory access environment - Google Patents

Memory detection method under the non uniform memory access environment Download PDF

Info

Publication number
TW201115474A
TW201115474A TW98136071A TW98136071A TW201115474A TW 201115474 A TW201115474 A TW 201115474A TW 98136071 A TW98136071 A TW 98136071A TW 98136071 A TW98136071 A TW 98136071A TW 201115474 A TW201115474 A TW 201115474A
Authority
TW
Taiwan
Prior art keywords
memory
execution
test
cpu
environment
Prior art date
Application number
TW98136071A
Other languages
Chinese (zh)
Inventor
Yan Li
Tom Chen
Original Assignee
Inventec Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Corp filed Critical Inventec Corp
Priority to TW98136071A priority Critical patent/TW201115474A/en
Publication of TW201115474A publication Critical patent/TW201115474A/en

Links

Landscapes

  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

A memory detection method under the non uniform memory access (NUMA) environment, including the following steps: obtaining the number of nodes in the system under the NUMA environment; replicating the execution thread of the memory test program more than one according to the number of nodes; binding each execution thread to each CPU and implementing each execution thread; and testing the dedicated memory of each CPU in parallel by using each execution thread. This method can significantly reduce test time while increasing test efficiency and test pressure.

Description

201115474 六、發明說明: 【發明所屬之技術領域】 本發明侧於-種記缝檢财法,尤其侧於—種非統一 記憶體存取(NUMA)環境下的記憶體檢測方法。 【先前技術】 現今,非統-記憶體存取Ac_,以 下簡稱NUMA)技術可贿料伺職料—祕雜運轉,同 時保留小系統便於編程和管理的優點。在含有許多中央處理單元 (Central Processing Unit,以下簡稱 cpu)的電腦上,1^硬 體可將專航舰與CPU配對,_大巾帛改善效能。 目月'J ’硬體的趨勢已走向可提供多個系統匯流排,而每一個 匿流排都可服務-小組處理ϋ。每-組處理器都有自己的記憶 體,可能的話,也會有自己的輸入/輸出(1/0)通道。不過,每個 CPU都可存取與使肋同方法設計之其辦組湖聯的記憶體。 母一個群組就稱為NUMA節點(node)。:numa節點内的CPU數 目取決於硬體供應商。存取本機記憶體會比存取與其他^^几仏節 點相關聯的記憶體更快。 在NUMA硬體上,有些記憶體區域實際上是位於其他區域的 不同匯流排上。因為NUMA使用本機和外部記憶體,所以存取某 些記憶體區域的時間有時會比存取其他區域更久。「本機記憶體」 和「外部s己憶體」通常是用來參考目前執行的執行線程。本機記 憶體是與目前執行之執行線程的CPU位於相同節點上的記憶體。 201115474 不屬於目前執行之執行線程節點的記憶體,就是外部記憶體。外 部記憶體也稱為「遠端記憶體」。 NUMA駐要優點是延展性。則^ _是為了超越對稱性 多重處理(SymmetricMultiprocessing,簡稱SMp)架構的延展性 限制而設計的。使用SMP時,所有記憶體存取都會公佈到相同共 用記憶體匯流排。這適合只有少數幾個CPU的時候使用,當有幾 十個甚至幾百個CPU爭相存取共用記憶體匯流排時,就不適用。 •突破了這些瓶頸,它限制任何一個記憶體匯流排上的CPU 數目,並利用ilj速互連的方式來連接不同的節點。 習知的NUMA環境下的記憶體檢測方法只是簡單的透過驅動 程式貝現了为配實體δ己憶體對其進行檢測的功能。並沒有能夠利 用NUMA系統本身所固有的硬體環境特點來進行更加優化的記憶 體檢測處理,因此透過優化現有的測試方法,測試壓力及測試效 率還有進一步提升的空間。 • 請參考「第1圖」,此圖為習知技術之NUMA環境下的記憶 體檢測方法所運行之系統方塊圖。如圖所示,習知技術之 環境下的記憶體檢測方法在具有多個CPU及其專屬記憶體的 NUMA硬體架構之環境中僅僅透過一個記憶體測試執行線程 來遍曆測試所有的實體記憶體,習知技術的這種檢測方法存有諸 多缺陷或限制,特別是: 1、/又有能夠利用多CPU的硬體環境來盡可能地並行作業藉 以提升系統之測試壓力。 201115474 2 1記憶體測試執行線程始終綁定在一個固定的CPU上9這 樣當測試遠端記憶體(屬於其他CPU的專屬記憶體)的時候,訪 問及存取速度會大幅降低,進而嚴重影響測試效率及測試壓力。 【發明内容】 為了解決上述習知技術中的問題與缺陷,本發明之目的在於 提供一種可減少測試用時並可大幅提升測試壓力之環境下 的記憶體檢測方法。 本發明所提供之一種NUMA環境下的記憶體檢測方法,係包 含以下步驟: 獲得NUMA環境下系統的節點個數; 根據節點健,將域體戦程式之執行_複製多份; 將每一執行線程分別綁定到不同的cpu之上加以執行;以及 利用各執行線程並行地測試各cpu專屬之記憶體。 、其中,上述本發明所提供之一種職^環境下的記憶體檢測 方法中’利用各執行線程並行地測試各cpu專屬之記憶體可進一 步包含如下步驟: #執行_運行之後,麵在各自_財_驅動仏 分配實體記憶體; 將實體記憶體映射到用戶進程空間;以及 ▲應用針對各種砰目的之演算法對各記紐進行讀寫驗證 黾全部的讀寫驗證結果均一致時,目丨丨 ’·’、 測試。 、則測忒通過,否則報錯並退d 201115474 此外’上述本發明之方法中,各執行線程係以cpu間規格書 上的相互_關賴咖無料扣糾 者在某兩個⑽空閒時單方向互測記憶體。 。體次 综上所述,本㈣倾供之―種舰仏職下的記憶體檢測 方法’由於透過複製多份記憶體測試程式之執行線程,並將每一 個執行線程分別綁定到不同的CPU上面去並行的測試各咖專 屬之記憶體,因而相對於f知技術具有如下之優點·· 1由於利用NUMA架構之硬體特性而使得記憶體測試之並 械大幅提高,進關峨並行度的提升使系統之測試勤得到 提升。201115474 VI. Description of the Invention: [Technical Field of the Invention] The present invention is directed to a method for detecting a wealth of money, in particular, a method for detecting a memory in a non-uniform memory access (NUMA) environment. [Prior Art] Nowadays, the non-system-memory access Ac_, hereinafter referred to as NUMA) technology, can bribed the servo-secret operation, while retaining the advantages of small system for programming and management. On a computer with many Central Processing Units (Cpus), the 1^ hardware can pair the special ship with the CPU. The trend of the month 'J ’ hardware has come to provide multiple system busses, and each of the escaping rows can be serviced - the group handles ϋ. Each-group processor has its own memory and, if possible, its own input/output (1/0) channel. However, each CPU has access to the memory of the group that is designed in the same way as the ribs. A parent group is called a NUMA node. The number of CPUs in the :numa node depends on the hardware vendor. Accessing local memory is faster than accessing memory associated with other nodes. On NUMA hardware, some memory areas are actually on different bus bars in other areas. Because NUMA uses this unit and external memory, accessing certain memory areas is sometimes longer than accessing other areas. "Local memory" and "external s memory" are usually used to refer to the currently executing execution thread. The local memory is the memory on the same node as the CPU of the currently executing execution thread. 201115474 Memory that is not part of the currently executing thread node, is external memory. External memory is also known as "remote memory." The advantage of NUMA residency is scalability. Then ^ _ is designed to exceed the scalability limitations of the Symmetric Multiprocessing (SMp) architecture. When using SMP, all memory accesses are published to the same shared memory bus. This is suitable for use with only a few CPUs. It is not applicable when there are dozens or even hundreds of CPUs competing to access the shared memory bus. • Breaking through these bottlenecks, it limits the number of CPUs on any of the memory busses and uses ilj speed interconnects to connect different nodes. The memory detection method in the conventional NUMA environment is simply a function of detecting the entity δ hexamed body through the driver program. It is not possible to use the hardware environment inherent in the NUMA system to perform more optimized memory detection processing. Therefore, by optimizing the existing test methods, there is room for further improvement in test pressure and test efficiency. • Please refer to “Figure 1”, which is a system block diagram of the memory detection method in the NUMA environment of the prior art. As shown in the figure, the memory detection method in the prior art environment traverses all physical memory through a memory test execution thread in an environment of NUMA hardware architecture with multiple CPUs and their dedicated memories. Body, the detection method of the prior art has many defects or limitations, in particular: 1. There is also a hardware environment that can utilize multiple CPUs to work as much as possible in parallel to improve the testing pressure of the system. 201115474 2 1 Memory test execution thread is always bound to a fixed CPU. 9 When testing remote memory (which belongs to other CPU's dedicated memory), access and access speed will be greatly reduced, which will seriously affect the test. Efficiency and testing pressure. SUMMARY OF THE INVENTION In order to solve the problems and deficiencies in the above-mentioned prior art, it is an object of the present invention to provide a memory detecting method in an environment which can reduce the test time and can greatly increase the test pressure. The memory detection method in the NUMA environment provided by the present invention comprises the following steps: obtaining the number of nodes of the system in the NUMA environment; performing the execution of the domain body program according to the node health; copying multiple copies; The threads are respectively bound to different cpus for execution; and each execution thread is used to test each cpu-specific memory in parallel. In the memory detection method provided by the above-mentioned invention, the memory of each CPU is tested in parallel by using each execution thread to further include the following steps: # execution_operation, face each other _ Finance_Driver allocates physical memory; maps physical memory to user process space; and ▲ applies read and write verification of each note to various algorithms丨'·', test. Then, the test passes, otherwise the error is reported and retired. 201115474 In addition, in the above method of the present invention, each execution thread is in a mutual direction of the cpu specification book, and the two directions are in a single direction. Mutual test memory. . In summary, the memory detection method under the “four-year-old” is based on the execution thread of copying multiple memory test programs, and each execution thread is bound to a different CPU. The above is a parallel test of the memory of each coffee, and thus has the following advantages over the technology of F. 1 Due to the hardware characteristics of the NUMA architecture, the memory test is greatly improved, and the degree of parallelism is improved. The upgrade has improved the testing of the system.

兹配合圖神紐實施例詳細說 2、由於每-份耻喻齡職執行線_不會去訪問 記憶體, 及測試壓力。 【實施方式】In conjunction with the diagram of the embodiment of the map, the details of the implementation of the 2, because each of the shameful age line _ will not access the memory, and test the pressure. [Embodiment]

請參考「第2圖」’此圖為本發明一 的記憶體檢測方法的整體步驟流程圖,^ NUMA環境下的記憶體檢測方法 獲得NUMA環境下系統的節點個數Please refer to "Fig. 2". This figure is a flowchart of the overall steps of the memory detecting method of the present invention. ^ Memory detecting method in the NUMA environment Obtaining the number of nodes in the NUMA environment

&月一實施例之NUMA環境下 圖,如圖所示,本發明之一種 係包含以下步驟: t (步驟 101); 之執行線程複製多份(步 201115474 將每一執行線程分別綁定到不同的Ci>u之上加以執行(步驟 103);以及 利用各執行線程並行地測試各CPU專屬之記憶體(步驟1〇4 )。 其中,如「第3圖」所示,上述本發明一實施例之一種 環境下的記憶體檢測方法中的步驟1〇4可進一步包含如下步驟: 當各執行線程運行之後,分別在各自的節點中利用驅動程式 分配實體記憶體(步驟1041); 將實體記憶體映射到用戶進程空間(步驟1〇42); 應用針對各種不同目的之演算法對各記憶體進行讀寫驗證 (步驟 1043); 判斷讀寫驗證結果是否均一致(步驟1〇44);以及 备全部的碩寫驗證結果均一致時,則測試通過(步驟, 否則報錯並退出測試(步驟1〇46)。 此外’上述本發明一實施例之NUMA環境下的記憶體檢測方 法中,各執躲程仙CPU間規格書上_互間的關延遲時間 較短的單方向訪問和測試記憶體,或者在某兩個cpu空閒時單方 向互測記憶體。 現在請參考「第4圖 '叫」凡固两不發明一實施例之NUMA^ 兄下的德體制方法崎行之祕方侧,如騎示,應用才 之。己隐體制方法,在具制試的喃首先會獲射統的歸 / (CPU與本地聰體賴合稱為節點)個數,隨後根據節點的 個數_細咖_執行_ 1G,賴分鑛各執行線鞋 201115474 1〇綁定到每個哪上面錢行,當記憶體測試執行線程Η)運行 起來之後,首先會分別在各自的節點中_驅動程式分 配實體記憶體,隨後將它們映射到用戶進程空間,接下來就應用 針對各鮮同目狀演算絲對記憶體進行讀寫驗證,當所有的 讀寫驗證結果都-致時,則測試通過。當在上述測試過程中出現 a寫驗也不致的Jf况時,就說明系統記憶體存有相應的品質缺 陷,進而使問題被檢測出來。因此,採用如上所述的本發明之 .NUMA %境下的^髓檢測綠將會贿崎決胃知技術之 環境下的Alt體檢測方法所存在的諸多問題。 雖然本發明贿述之紐實财式揭露如上,然其並非用以 限定本發明。本賴之驗人·當意朗在械縣發明所附 之申請專利範圍所揭示之本發明之範圍和精神之情況下,所為之 更動,、潤_ ’均屬本發明之專利保護範圍之内。關於本發明所界 定之保護範圍請參考所附之申請專利範圍。 • 【圖式簡單說明】 第1圖為習知技術之NUMA環境下的記憶體檢測方法所運行 之系統方塊圖; 第2圖為本發明一實施例之NUMA環境下的記憶體檢測方法 的整體步驟流程圖; 第3圖為第2圖中步驟1〇4之分解步騾流程圖;以及 第4圖為本發明一實施例之NUMA環境下的記憶體檢測方法 所運行之系統方塊圖。 201115474 【主要元件符號說明】 10 記憶體測試執行線程In the NUMA environment of the month embodiment, as shown in the figure, one aspect of the present invention comprises the following steps: t (step 101); executing thread copying multiple copies (step 201115474 binds each execution thread to each Executing on different Ci>u (step 103); and testing each CPU-specific memory in parallel by using each execution thread (step 1〇4), wherein, as shown in "Fig. 3", the above-mentioned invention Step 1 to 4 in the memory detection method in an environment of an embodiment may further include the following steps: after each execution thread runs, respectively, using the driver to allocate the physical memory in the respective nodes (step 1041); The memory is mapped to the user process space (steps 1 〇 42); the memory is read and written for each of the different purposes (step 1043); the read and write verification results are consistent (step 1 〇 44); And if all the mastering verification results are consistent, then the test passes (step, otherwise the error is reported and the test is exited (step 1〇46). Further, the above description of the NUMA environment according to an embodiment of the present invention In the method of detecting the memory, each of the unidirectional access and test memories with a short delay time between the CPUs in the CPU specification book, or a single direction mutual memory when two CPUs are idle. Please refer to "4th figure" called "The two sides of the NUMA^ brothers under the NUMA^ brothers who have not invented an embodiment, such as riding the show, applying the talent. The method of the hidden system, in the test The first will be the number of the system (the CPU and the local smart body called the node), and then according to the number of nodes _ fine coffee _ implementation _ 1G, Lai mine each execution line shoes 201115474 1〇 binding To each of the above money lines, when the memory test execution thread 运行) runs, the _driver first allocates the physical memory in the respective nodes, and then maps them to the user process space, and then applies the target Each of the fresh-eyed calculus tests the memory for reading and writing. When all the reading and writing verification results are correct, the test passes. When the JF condition is not found during the above test, the system is described. Memory has a corresponding lack of quality The problem is detected, so that the problem of the Alt body detection method in the environment of the NUMA% environment of the invention described above is adopted. The present invention is not intended to limit the present invention, and the scope of the present invention disclosed in the scope of the patent application of the invention is hereby incorporated by reference. In the case of the spirit, it is within the scope of patent protection of the present invention. For the scope of protection defined by the present invention, please refer to the attached patent application scope. The figure is a system block diagram of a memory detecting method in a NUMA environment of the prior art; FIG. 2 is a flow chart of the overall steps of the memory detecting method in a NUMA environment according to an embodiment of the present invention; 2 is a flowchart of the decomposition steps of steps 1 and 4; and FIG. 4 is a system block diagram of the operation of the memory detection method in the NUMA environment according to an embodiment of the present invention. 201115474 [Main component symbol description] 10 memory test execution thread

Claims (1)

201115474 七、申請專利範圍: 1、一種非統一記憶體存取(NUMA)環境下的記憶體檢測方法, 係包含以下步驟: 獲得非統一記憶體存取環境下系統的一節點個數; 根據該節點個數,將一記憶體測試程式之一執行線程複製多 份; 衣 將每一該執行線程分別綁定到不同的一 CPU之上加以執行; φ 以及 利用所述各該執行線程並行地測試所述各該cpu專屬之一 ★己 憶體。 2、如申請專植圍第1項所狀記㈣_方法,其巾利用所述 各執行線程並行地測試所述各該CPU專屬之該記憶體進一步包含 如下步驟: 當所述各該執行線程運行之後,分別在各自的一節點中利用 驅動程式分配一實體記憶體; 將該貝體記憶體映射到用戶進程空間;以及 應用針對各種不同目的之演算法對所述各該記憶體進行 寫驗證,當全部的該讀寫驗證結果均—致時,朗試通過 報錯並退出測試。 體檢測方法,其中所述各該 互間的訪問延遲時間較短的 3、如申睛專利範圍第2項所述之記憶 執行線程係以該CPU間規格書上的相 單方向訪問和測試所述該記憶體。 201115474 4、如申請專利範圍第2項所述之記憶體檢測方法,其中所述各該 執行線程係在某兩個該CPU空閒時單方向互測該記憶體。201115474 VII. Patent application scope: 1. A memory detection method in a non-uniform memory access (NUMA) environment, comprising the following steps: obtaining a number of nodes of a system in a non-uniform memory access environment; The number of nodes, a copy of one of the memory test programs is executed by the thread; each of the execution threads is bound to a different CPU for execution; φ and the parallel execution of the execution threads Each of the cpu exclusive ones is a memory. 2. In the case of applying for the special item (4)_method of the first item, the memory using the execution threads to test the memory exclusive to the CPU in parallel includes the following steps: when the execution thread After the operation, the driver is allocated a physical memory in a respective node; the shell memory is mapped to the user process space; and the memory is written and verified by an algorithm for various purposes. When all the results of the read and write verification are met, the trial passes the error and exits the test. The method for detecting a body, wherein the access delay time of each of the inter-subsequents is shorter. 3. The memory execution thread according to item 2 of the scope of the patent application is accessed and tested in the direction of the single direction in the inter-CPU specification. Describe the memory. The method for detecting a memory according to claim 2, wherein each of the execution threads mutually detects the memory in a single direction when two of the CPUs are idle. 1212
TW98136071A 2009-10-23 2009-10-23 Memory detection method under the non uniform memory access environment TW201115474A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW98136071A TW201115474A (en) 2009-10-23 2009-10-23 Memory detection method under the non uniform memory access environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW98136071A TW201115474A (en) 2009-10-23 2009-10-23 Memory detection method under the non uniform memory access environment

Publications (1)

Publication Number Publication Date
TW201115474A true TW201115474A (en) 2011-05-01

Family

ID=44934478

Family Applications (1)

Application Number Title Priority Date Filing Date
TW98136071A TW201115474A (en) 2009-10-23 2009-10-23 Memory detection method under the non uniform memory access environment

Country Status (1)

Country Link
TW (1) TW201115474A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104681101A (en) * 2013-11-28 2015-06-03 英业达科技有限公司 Storage detection system based on non-uniform storage access framework and method thereof
US20160154720A1 (en) * 2014-11-28 2016-06-02 Inventec (Pudong) Technology Corporation Pressure testing method and pressure testing device for a quick path interconnect bus

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104681101A (en) * 2013-11-28 2015-06-03 英业达科技有限公司 Storage detection system based on non-uniform storage access framework and method thereof
US20160154720A1 (en) * 2014-11-28 2016-06-02 Inventec (Pudong) Technology Corporation Pressure testing method and pressure testing device for a quick path interconnect bus

Similar Documents

Publication Publication Date Title
Cavicchioli et al. Memory interference characterization between CPU cores and integrated GPUs in mixed-criticality platforms
Wadden et al. Real-world design and evaluation of compiler-managed GPU redundant multithreading
Tam et al. Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
TW385387B (en) Method and system for performance monitoring in a multithreaded processor
CN107667358B (en) Apparatus for use in multiple topologies and method thereof
TWI335512B (en) Technique for using memory attributes
US10241880B2 (en) Efficient validation/verification of coherency and snoop filtering mechanisms in computing systems
Hestness et al. GPU computing pipeline inefficiencies and optimization opportunities in heterogeneous CPU-GPU processors
US20080208555A1 (en) Simulation method and simulation apparatus
Diener et al. Evaluating thread placement based on memory access patterns for multi-core processors
JP2008515069A5 (en)
US9229715B2 (en) Method and apparatus for efficient inter-thread synchronization for helper threads
CN109328341B (en) Processor, method and system for identifying storage that caused remote transaction execution to abort
CN106663471B (en) Method and apparatus for reverse memory backup
Tan et al. Combating the reliability challenge of GPU register file at low supply voltage
Candel et al. Accurately modeling the on-chip and off-chip GPU memory subsystem
TW201015545A (en) Light weight and high throughput test case generation methodology for testing cache/TLB intervention and diagnostics
TW201115474A (en) Memory detection method under the non uniform memory access environment
Santos et al. Performance evaluation of data migration methods between the host and the device in CUDA-based programming
US10949330B2 (en) Binary instrumentation to trace graphics processor code
US10019341B2 (en) Using hardware performance counters to detect stale memory objects
Akram et al. NUMA implications for storage I/O throughput in modern servers
Wicaksono et al. Detecting false sharing in openmp applications using the darwin framework
Vujic et al. DMA++: On the fly data realignment for on-chip memories
Papagiannopoulou et al. Speculative synchronization for coherence-free embedded NUMA architectures