WO2023240719A1 - 内存测试方法及装置、存储介质及电子设备 - Google Patents

内存测试方法及装置、存储介质及电子设备 Download PDF

Info

Publication number
WO2023240719A1
WO2023240719A1 PCT/CN2022/104440 CN2022104440W WO2023240719A1 WO 2023240719 A1 WO2023240719 A1 WO 2023240719A1 CN 2022104440 W CN2022104440 W CN 2022104440W WO 2023240719 A1 WO2023240719 A1 WO 2023240719A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
processor
local memory
test
execution thread
Prior art date
Application number
PCT/CN2022/104440
Other languages
English (en)
French (fr)
Inventor
连军委
黄涛
Original Assignee
长鑫存储技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 长鑫存储技术有限公司 filed Critical 长鑫存储技术有限公司
Publication of WO2023240719A1 publication Critical patent/WO2023240719A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • G06F11/2221Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested to test input/output devices or peripheral units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2273Test methods
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/04Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
    • G11C29/08Functional testing, e.g. testing during refresh, power-on self testing [POST] or distributed testing

Definitions

  • the present disclosure relates to the field of integrated circuit technology, and specifically to a memory testing method, a memory testing device, a computer-readable storage medium and an electronic device.
  • NUMA Non-Uniform Memory Access
  • NUMA system is a memory design for multi-processors, and each processor has a large storage capacity, it takes a long time for the processor to test the memory. Therefore, reducing the test time of processor memory in NUMA systems is an urgent problem that needs to be solved.
  • a memory testing method includes: determining the local memory of each processor; evenly allocating the local memory to each execution thread of the processor; utilizing each execution thread Test the allocated local memory in parallel.
  • determining the local memory of each processor includes: determining the physical address demarcation point of each processor; determining the physical address demarcation point of each processor according to the physical address demarcation point. of local memory.
  • determining the physical address demarcation point of each processor includes: using an address decoder to decode an address signal to obtain the physical address demarcation point corresponding to the memory physical address.
  • allocating the local memory equally to each execution thread of the processor includes: determining the total storage amount of the local memory corresponding to the processor; The total storage amount is divided by the number of the execution threads of the processor to determine the average storage amount of the local memory allocated by each execution thread; based on the average storage amount, the allocation of each execution thread is determined of the local memory.
  • the number of execution threads is equal to the number of cores of the processor.
  • using each of the execution threads to test the allocated local memory in parallel includes: using each of the execution threads to read and write the allocated local memory. Verification; when all the reading and writing verification results are consistent, the test passes; otherwise, an error is reported, and the test completion prompts that the test failed.
  • the execution thread is used to test non-uniform memory access to the memory of a NUMA system.
  • a memory testing device includes: a local memory determination module for determining the local memory of each processor; a memory allocation module for each execution thread of the processor. Allocate the local memory evenly; a testing module is configured to use each execution thread to test the allocated local memory in parallel.
  • the local memory determination module is used to determine the physical address demarcation point of each processor; determine the local memory of each processor according to the physical address demarcation point. .
  • the local memory determination module is configured to use an address decoder to decode the address signal to obtain the physical address demarcation point corresponding to the memory physical address.
  • the memory allocation module is configured to determine the total storage amount of the local memory corresponding to the processor; divide the total storage amount by all the storage amount of the processor. The number of execution threads is used to determine the average storage amount of the local memory allocated to each execution thread; based on the average storage amount, the local memory allocated to each execution thread is determined.
  • the number of execution threads is equal to the number of cores of the processor.
  • the test module is configured to use each execution thread to perform read and write verification on the allocated local memory; when all the read and write verification results are consistent , the test passes; otherwise, an error is reported, and the test is completed and the test fails.
  • the execution thread is used to test non-uniform memory access to the memory of a NUMA system.
  • a computer-readable storage medium on which a computer program is stored, wherein the computer program implements the above memory testing method when executed by a processor.
  • an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the operation via executing the executable instructions. Execute the above memory test method.
  • Figure 1 schematically shows a structural comparison diagram of consistent memory access and non-consistent memory access according to an exemplary embodiment of the present disclosure
  • Figure 2 schematically shows a node diagram of non-uniform memory access according to an exemplary embodiment of the present disclosure
  • Figure 3 schematically illustrates a speed comparison diagram of accessing different memories in a non-uniform memory access according to an exemplary embodiment of the present disclosure
  • Figure 4 schematically shows a step flow chart of a memory testing method according to an exemplary embodiment of the present disclosure
  • Figure 5 schematically shows a flow chart of steps for evenly allocating local memory in a memory testing method according to an exemplary embodiment of the present disclosure
  • FIG. 6 schematically illustrates a block diagram of a memory testing device according to an exemplary embodiment of the present disclosure
  • FIG. 7 schematically shows a module diagram of an electronic device according to an exemplary embodiment of the present disclosure.
  • Example embodiments will now be described more fully with reference to the accompanying drawings.
  • Example embodiments may, however, be embodied in various forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concepts of the example embodiments.
  • the described features, structures or characteristics may be combined in any suitable manner in one or more embodiments.
  • numerous specific details are provided to provide a thorough understanding of embodiments of the disclosure.
  • those skilled in the art will appreciate that the technical solutions of the present disclosure may be practiced without one or more of the specific details described, or other methods, components, devices, steps, etc. may be adopted.
  • well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the disclosure.
  • UMA Uniform Memory Access
  • NUMA Compared with UMA, NUMA is characterized by that each processor has a local memory as shown in NUMA in Figure 1. And each processor can access the local memory of other processors (equivalent to remote memory).
  • processors and corresponding local memory can also be divided into different groups, and a group of processors can access their own local memory together (equivalent to a memory group).
  • processors and corresponding memory banks form a NUMA node, as shown in Figure 2.
  • SMP Symmetric Multiprocessing
  • NUMA can access remote memory when the local memory is not enough. Since the processor accesses local memory faster than the remote memory, as shown in Figure 3, the speed of processor CPU0 accessing local memory (memory 0) , faster than accessing the local memory of processor CPU1 (memory 1), faster than accessing the local memory of processor CPU2 (memory 2), similarly, faster than accessing the local memory of processor CPU3 (memory 3) ) is fast, among which, memory 1, memory 2 and memory 3 are remote memories for the processor CPU0. Since the processor accesses the remote memory slowly, a delay occurs when the processor accesses the remote memory, and the access efficiency is significantly reduced.
  • a memory test thread is used to traverse and test all memories.
  • the memory test thread is always bound to one processor.
  • the access and access speed will be greatly reduced. , seriously affecting test efficiency.
  • exemplary embodiments of the present disclosure provide a NUMA-based memory testing method for testing non-uniform memory access to NUMA memory.
  • FIG. 4 a flow chart of steps of a memory testing method according to an embodiment of the present disclosure is shown.
  • the above memory testing method may include:
  • Step S410 Determine the local memory of each processor
  • Step S420 Evenly allocate local memory to each execution thread of the processor
  • Step S430 Use each execution thread to test the allocated local memory in parallel.
  • the memory testing method provided by the disclosed embodiments determines the local memory of the processor and uses each execution thread of the processor to test the allocated local memory, which can avoid the processor from testing access to remote memory, thereby improving memory Test efficiency; on the other hand, by evenly allocating local memory to each execution thread of the processor, and using each execution thread to test the evenly allocated local memory in parallel, the test time of each execution thread can be made the same, thus shortening The total test time further improves the efficiency of memory testing.
  • step S410 the local memory of each processor is determined.
  • the non-uniform memory access NUMA architecture refers to a multi-processor system where the memory access time depends on the relative position between the processor and the memory. In this architecture, there is memory relatively close to the processor, often called local memory; and there is memory relatively far from the processor, often called remote memory.
  • the so-called local memory is the memory that the CPU can access through the iMC (Integrated Memory Controller) in the Uncore (non-computing core) component.
  • Those non-local, remote memories need to be accessed through the link of the QPI (QuickPath Interconnect) controller to the iMC of the local CPU where the memory is located.
  • QPI Quality of Peripheral Component Interconnect
  • the physical address demarcation point of each processor is determined based on the determination.
  • the physical address demarcation point is two adjacent physical addresses in the memory physical address that belong to different processors.
  • the memory physical address can be divided into memory physical address blocks through the physical address demarcation point, and different memory physical address blocks belong to different processors.
  • the physical address of the memory is determined by its location on the address bus. After the machine is installed, its physical address is fixed and unchanging, and is not determined by the processor. allocated by the CPU.
  • the physical address refers to the address loaded into the memory address register and is the real address of the memory unit.
  • the memory addresses transmitted on the front-side bus are memory physical addresses, numbered starting from 0 and going to the highest end of available physical memory. These numbers are mapped onto the actual memory sticks by the Nortbridge Chip.
  • an address decoder Decoder needs to be used to decode the address signal. , to obtain the physical address demarcation point corresponding to the memory physical address.
  • the address signal can be captured by a logic analyzer, which is an instrument that analyzes the logical relationships of digital systems.
  • a logic analyzer is a bus analyzer among the data domain test instruments. It is an instrument that is based on the concept of bus (multi-line) and simultaneously observes and tests the data flow on multiple data lines.
  • the address signal captured by the logic analyzer can be decoded by the address decoder, and then the data is parsed out to obtain the memory physical address, as shown in Table 1.
  • SK is Socket
  • one Socket corresponds to one physical CPU.
  • the physical memory address 0x0000008FFFFFFFFFFFFF can be determined as a physical address demarcation point.
  • the memory corresponding to the physical address demarcation point and the address before it belongs to the local memory of the processor CPU00, and the memory corresponding to the address after the physical address demarcation point belongs to the local memory of the processor CPU01.
  • the physical memory address 0x0000009000000000 can be determined as another physical address.
  • the memory corresponding to the address before the physical address demarcation point belongs to the local memory of the processor CPU00.
  • the physical memory addresses 0x0000008FFFFFFFFF and 0x0000009000000000 belong to adjacent physical addresses.
  • the physical address corresponding to each processor can be determined based on the physical address demarcation point, so that the corresponding local memory belonging to each processor can be determined.
  • the physical address demarcation point is a kind of memory address segmentation point, it is equivalent to dividing the memory into different memory blocks and then determining which processor each memory block belongs to. Therefore, it has nothing to do with the memory node, or even across memory nodes.
  • Table 1 the above physical memory addresses all belong to the same memory module DIMM00, but the addresses are divided into local memories of different processors. Among them, the full name of DIMM is Dual-Inline-Memory-Modules, and the Chinese name is dual in-line memory module.
  • the memory block belongs to a continuous memory address block relative to the processor, and the memory node is a group of memory bars on the computer memory topology.
  • the computer does memory mapping, it does not distinguish between memory nodes, but only addresses according to the memory address, so during the addressing process, it will definitely cross memory nodes.
  • the memory testing method provided by the exemplary embodiment of the present disclosure divides the memory address into different memory blocks by determining the physical address demarcation point, so that there is no need to pay attention to additional operations of the memory node and will not be restricted by the memory node.
  • a logic analyzer in addition to capturing the address signal through a logic analyzer, it can also be recorded and captured through instruments such as a DDR (Double Data Rate, double-rate synchronous dynamic random access memory) memory protocol analyzer.
  • DDR Double Data Rate, double-rate synchronous dynamic random access memory
  • This disclosure example The specific implementation mode does not specifically limit the specific grabbing device.
  • step S420 local memory is evenly allocated to each execution thread of the processor.
  • the task of performing memory testing is called an execution thread, and each execution thread is completed by a core in the processor.
  • a processor usually contains multiple cores, for example, 36 cores, and each core is used to perform the task of memory testing.
  • the core is the core of the processor.
  • a processor can have multiple cores (that is, a multi-core processor), and a core can only belong to one processor.
  • the CPU core is the core chip in the middle of the CPU. It is made of monocrystalline silicon and is used to complete all calculations, accept/store commands, process data, etc. It is the core of digital processing.
  • the core (Die), also known as the kernel, is the most important component of the CPU.
  • the bulging chip in the center of the CPU is the core, which is made of monocrystalline silicon using a certain production process. All calculations, acceptance/storage commands, and data processing of the CPU are performed by the core.
  • execution threads For the execution threads provided by the embodiments of the present disclosure, their number is equal to the number of cores of the processor.
  • One processor core corresponds to one execution thread. Each execution thread is started according to the processor core and executed by the core.
  • processor CPU00 corresponds to multiple segments of local memory
  • processor CPU01 corresponds to multiple segments of local memory. Therefore, after determining the local memory of each processor, local memory can be allocated to the core of the processor as needed, that is, local memory is allocated to each execution thread.
  • local memory can be evenly allocated to each execution thread of the processor, so that the size of the local memory tested by each execution thread is the same, thereby ensuring that each execution thread takes the same time to perform the memory test, and also That is, the local memory size to be tested is the same for each core of the processor.
  • This process is equivalent to averaging the memory test time into the cores of each processor, thereby shortening the total memory test time and improving the efficiency of memory testing.
  • each processor tests the local memory rather than the remote memory, thus avoiding the delay problem when testing remote memory, further improving the efficiency of memory testing.
  • the step of evenly allocating local memory to each execution thread of the processor includes:
  • Step S510 Determine the total storage amount of local memory corresponding to the processor
  • Step S520 Divide the total storage amount by the number of execution threads of the processor to determine the average storage amount of local memory allocated by each execution thread;
  • Step S530 Determine the local memory allocated to each execution thread based on the average storage amount.
  • local memory is evenly allocated to each execution thread of the processor, and in the process of evenly allocating local memory, the total storage of local memory corresponding to the processor is determined.
  • the amount is combined with the number of execution threads of the processor to determine the average amount of storage allocated to each execution thread, and then allocate local memory to each execution thread based on the average amount of storage, thereby achieving a method of evenly allocating local memory.
  • step S510 the total storage amount of local memory corresponding to the processor is determined.
  • the storage amount of each local memory corresponding to the processor can be determined first (the storage amount of each local memory may be different), and then the processing By adding the storage amounts of all local memories corresponding to the processor, the total storage amount of local memory corresponding to the processor can be obtained.
  • the local memory corresponding to CPU00 includes: the memory corresponding to the physical memory address from 0x0000007000000000 to 0x0000008FFFFFFF; and the local memory corresponding to CPU01 includes: the physical memory from 0x0000009000000000 to 0x00000b0000000000 address corresponding memory.
  • the local memory contained in it may have a different storage amount for each segment of local memory. You can add the storage amounts of all local memories to obtain the total storage amount of the local memory corresponding to CPU00, for example: 18GB.
  • the local memory contained in it may have different storage amounts for each segment of local memory. You can add the storage amounts of all segments of local memory to obtain the total storage amount of local memory corresponding to CPU01. For example, 10GB.
  • the total storage amount of local memory corresponding to the processor can be obtained.
  • step S520 the total storage amount is divided by the number of execution threads of the processor to determine the average storage amount of local memory allocated by each execution thread.
  • local memory can be allocated to each execution thread based on the total storage amount. As mentioned before, evenly allocating local memory to each execution thread of the processor can shorten the time of memory testing and improve the efficiency of memory testing.
  • the local memory may be evenly allocated according to the number of execution threads, so that the size of the local memory allocated to each execution thread is the same. That is, the local memory can be evenly distributed by dividing the total storage amount of the local memory corresponding to the processor by the number of execution threads of the processor. It is equivalent to determining the average storage amount of local memory that can be allocated to each execution thread, and specific local memory can be allocated based on the average storage amount.
  • the number of execution threads of the processor is the number of cores of the processor. Dividing the total storage capacity of local memory by the number of execution threads of the processor is equivalent to dividing the total storage capacity of local memory by the number of processor cores. The number of cores is equivalent to determining the average storage amount for each core of memory and allocating local memory equally. Typically, different processors may have different numbers of cores.
  • processors CPU00 and CPU01 continue to take the above-mentioned processors CPU00 and CPU01 as an example.
  • the above average storage amount is for each execution thread, that is, the size of the local memory that needs to be tested for each core of the processor, and does not refer to a specific local memory.
  • step S530 the local memory allocated to each execution thread is determined based on the average storage amount.
  • the local memory after determining the average storage amount of local memory allocated to each execution thread in each processor, can be evenly allocated to each execution thread based on the average storage amount.
  • each execution thread can be allocated 0.5GB of local memory; for processor CPU01, each execution thread can be allocated 0.556GB of local memory.
  • each local memory is different, and different sizes of local memory can be allocated to each execution thread according to the average storage size.
  • an execution thread of processor CPU00 can be allocated two 0.2GB local memories corresponding to the processor and one 0.1GB local memory corresponding to the processor.
  • an execution thread of processor CPU01 can be allocated two local memories of 0.25GB corresponding to the processor, and then allocate a smaller local memory of 0.056GB.
  • CPU00 and CPU01 are only illustrative, and the exemplary embodiments of the present disclosure do not place special limitations on the number of execution threads of the processor and the corresponding storage amount of the local memory.
  • step S430 each execution thread is used to test the allocated local memory in parallel.
  • each execution thread can be used to test the allocated local memory. Specifically, during the test process, each execution thread is tested in parallel, which can shorten the overall test time and improve the test efficiency.
  • each execution thread can be used to perform read and write verification on the allocated local memory. That is to say, the execution thread writes test data in the corresponding local memory, and then writes the test data after a preset time. The incoming test data is read out.
  • the specific preset time can be determined according to actual conditions, and the exemplary embodiments of the present disclosure do not specifically limit this.
  • the memory testing method can reduce the number of test codes by sharing a set of test codes for all execution threads. The size of the memory space occupied, thereby increasing the size of testable memory.
  • the memory testing method provided by exemplary embodiments of the present disclosure is equivalent to dividing the local memory of the processor into different memory blocks for each execution thread. After allocating local memory to each execution thread based on the average storage amount, when each execution thread executes the test program, it only needs to enter the start and end addresses of the memory block. The test program obtains the start and end addresses. After that, start testing the corresponding memory.
  • each execution thread of the above processor is mainly used to test non-uniform memory access to the memory of the NUMA system.
  • the memory testing method divides the local memory of each processor by determining the physical address demarcation point, and divides the memory address into different memory blocks without distinguishing the memory nodes, thus There is no need to pay attention to the additional operations of the memory node and will not be restricted by the memory node.
  • the processor can be prevented from testing access to remote memory, thereby improving the efficiency of memory testing; on the other hand, by Allocate local memory evenly to each execution thread of the processor, and use each execution thread to test the evenly allocated local memory in parallel.
  • the local memory can average the memory test time, and the test time of each execution thread is the same, which can shorten the total test time and further improve the efficiency of memory test.
  • the memory testing device 600 may include: a local memory determination module 610, a memory allocation module 620 and a testing module 630, wherein:
  • the local memory determination module 610 can be used to determine the local memory of each processor
  • the memory allocation module 620 can be used to evenly allocate local memory to each execution thread of the processor
  • the testing module 630 may be used to test the allocated local memory in parallel using each execution thread.
  • the local memory determination module 610 may be used to determine the physical address demarcation point of each processor; determine the local memory of each processor based on the physical address demarcation point.
  • the local memory determination module 610 may be configured to use an address decoder to decode the address signal to obtain the physical address demarcation point corresponding to the memory physical address.
  • the memory allocation module 620 can be used to determine the total storage amount of the local memory corresponding to the processor; divide the total storage amount by the number of execution threads of the processor to determine each execution The average storage amount of local memory allocated by threads; based on the average storage amount, determine the local memory allocated by each execution thread.
  • the number of execution threads is equal to the number of cores of the processor.
  • the test module 630 can be used to use each execution thread to perform read and write verification on the allocated local memory; when all the read and write verification results are consistent, the test passes; Otherwise, an error is reported, and the test is completed and the test fails.
  • an execution thread is used to test non-uniform memory access to the memory of a NUMA system.
  • modules or units of the memory test device are mentioned in the detailed description above, this division is not mandatory.
  • the features and functions of two or more modules or units described above may be embodied in one module or unit.
  • the features and functions of one module or unit described above may be further divided into being embodied by multiple modules or units.
  • an electronic device capable of implementing the above method is also provided.
  • FIG. 7 An electronic device 700 according to this embodiment of the invention is described below with reference to FIG. 7 .
  • the electronic device 700 shown in FIG. 7 is only an example and should not impose any limitations on the functions and usage scope of the embodiments of the present invention.
  • electronic device 700 is embodied in the form of a general computing device.
  • the components of the electronic device 700 may include, but are not limited to: the above-mentioned at least one processing unit 710, the above-mentioned at least one storage unit 720, a bus 730 connecting different system components (including the storage unit 720 and the processing unit 710), and the display unit 740.
  • the storage unit 720 stores program code, and the program code can be executed by the processing unit 710, so that the processing unit 710 performs various examples according to the present invention described in the "Exemplary Method" section of this specification.
  • sexual implementation steps For example, the processing unit 710 can perform step S410 as shown in Figure 2 to determine the local memory of each processor; step S420, evenly allocate local memory to each execution thread of the processor; step S430, use each execution thread to parallelize Test allocated local memory.
  • the storage unit 720 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 7201 and/or a cache storage unit 7202, and may further include a read-only storage unit (ROM) 7203.
  • RAM random access storage unit
  • ROM read-only storage unit
  • Storage unit 720 may also include a program/utility 7204 having a set of (at least one) program modules 7205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, Each of these examples, or some combination, may include the implementation of a network environment.
  • Bus 730 may be a local area representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or using any of a variety of bus structures. bus.
  • Electronic device 700 may also communicate with one or more external devices 770 (e.g., keyboard, pointing device, Bluetooth device, etc.), may also communicate with one or more devices that enable a user to interact with electronic device 700, and/or with Any device (eg, router, modem, etc.) that enables the electronic device 700 to communicate with one or more other computing devices. This communication may occur through input/output (I/O) interface 750.
  • the electronic device 700 may also communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 760. As shown, network adapter 760 communicates with other modules of electronic device 700 via bus 730 .
  • network adapter 760 communicates with other modules of electronic device 700 via bus 730 .
  • the example embodiments described here can be implemented by software, or can be implemented by software combined with necessary hardware. Therefore, the technical solution according to the embodiment of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to cause a computing device (which may be a personal computer, a server, a terminal device, a network device, etc.) to execute a method according to an embodiment of the present disclosure.
  • a computing device which may be a personal computer, a server, a terminal device, a network device, etc.
  • a computer-readable storage medium is also provided, on which a program product capable of implementing the method described above in this specification is stored.
  • various aspects of the present invention can also be implemented in the form of a program product, which includes program code.
  • the program product is run on a terminal device, the program code is used to cause the The terminal device performs the steps according to various exemplary embodiments of the present invention described in the "Exemplary Method" section above in this specification.
  • the program product for implementing the above method according to an embodiment of the present invention can adopt a portable compact disk read-only memory (CD-ROM) and include program code, and can be run on a terminal device, such as a personal computer.
  • a terminal device such as a personal computer.
  • the program product of the present invention is not limited thereto.
  • a readable storage medium may be any tangible medium containing or storing a program that may be used by or in combination with an instruction execution system, apparatus or device.
  • the program product may take the form of any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination thereof. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a readable signal medium may also be any readable medium other than a readable storage medium that can send, propagate, or transport the program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a readable medium may be transmitted using any suitable medium, including but not limited to wireless, wireline, optical cable, RF, etc., or any suitable combination of the foregoing.
  • Program code for performing the operations of the present invention may be written in any combination of one or more programming languages, including object-oriented programming languages such as Java, C++, etc., as well as conventional procedural Programming language—such as "C" or a similar programming language.
  • the program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on.
  • the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (e.g., provided by an Internet service). (business comes via Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service e.g., provided by an Internet service

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

一种内存测试方法、内存测试装置、计算机可读存储介质及电子设备,涉及集成电路技术领域。该内存测试方法包括:确定各处理器的本地内存;为所述处理器的各执行线程平均分配所述本地内存;利用各所述执行线程并行地测试分配到的所述本地内存。提供一种减小NUMA系统中处理器对内存的测试时间的方法。

Description

内存测试方法及装置、存储介质及电子设备
相关申请的交叉引用
本申请要求于2022年06月15日提交的申请号为202210681561.7、名称为“内存测试方法及装置、存储介质及电子设备”的中国专利申请的优先权,该中国专利申请的全部内容通过引用全部并入本文。
技术领域
本公开涉及集成电路技术领域,具体而言,涉及一种内存测试方法、内存测试装置、计算机可读存储介质及电子设备。
背景技术
非一致性内存访问(Non Uniform Memory Access,NUMA)技术可以使众多服务器像单一系统那样运转,同时保留小系统便于编程和管理的优点。
由于NUMA系统是一种用于多处理器的内存体设计,且每个处理器的存储容量较大,使得处理器对内存的测试时间较长。因此,减小NUMA系统中处理器对内存的测试时间是一项亟待解决的问题。
发明内容
根据本公开的第一方面,提供一种内存测试方法,所述方法包括:确定各处理器的本地内存;为所述处理器的各执行线程平均分配所述本地内存;利用各所述执行线程并行地测试分配到的所述本地内存。
在本公开的一种示例性实施方式中,所述确定各处理器的本地内存,包括:确定各所述处理器的物理地址分界点;根据所述物理地址分界点,确定各所述处理器的本地内存。
在本公开的一种示例性实施方式中,所述确定各所述处理器的物理地址分界点,包括:利用地址解码器对地址信号解码,获得内存物理地址对应的所述物理地址分界点。
在本公开的一种示例性实施方式中,所述为所述处理器的各执行线程平均分配所述本地内存,包括:确定所述处理器对应的所述本地内存的总存储量;将所述总存储量除以所述处理器的所述执行线程的数量,确定出各所述执行线程分配的所述本地内存的平均存储量;根据所述平均存储量,确定各所述执行线程分配的所述本地内存。
在本公开的一种示例性实施方式中,所述执行线程的数量等于所述处理器的内核的数量。
在本公开的一种示例性实施方式中,所述利用各所述执行线程并行地测试分配到的所述本地内存,包括:利用各所述执行线程对分配到的所述本地内存进行读写验证;当全部的所述读写验证的结果均一致时,则测试通过;否则报告错误,测试完成提示测试失败。
在本公开的一种示例性实施方式中,所述执行线程用于测试非一致性内存访问NUMA系统的内存。
根据本公开的第二方面,提供一种内存测试装置,所述装置包括:本地内存确定模块,用于确定各处理器的本地内存;内存分配模块,用于为所述处理器的各执行线程平均分配所述本地内存;测试模块,用于利用各所述执行线程并行地测试分配到的所述本地内存。
在本公开的一种示例性实施方式中,所述本地内存确定模块,用于确定各所述处理器的物理地址分界点;根据所述物理地址分界点,确定各所述处理器的本地内存。
在本公开的一种示例性实施方式中,所述本地内存确定模块,用于利用地址解码器对地址信号解码,获得内存物理地址对应的所述物理地址分界点。
在本公开的一种示例性实施方式中,所述内存分配模块,用于确定所述处理器对应的所述本地内存的总存储量;将所述总存储量除以所述处理器的所述执行线程的数量,确定 出各所述执行线程分配的所述本地内存的平均存储量;根据所述平均存储量,确定各所述执行线程分配的所述本地内存。
在本公开的一种示例性实施方式中,所述执行线程的数量等于所述处理器的内核的数量。
在本公开的一种示例性实施方式中,所述测试模块,用于利用各所述执行线程对分配到的所述本地内存进行读写验证;当全部的所述读写验证的结果均一致时,则测试通过;否则报告错误,测试完成提示测试失败。
在本公开的一种示例性实施方式中,所述执行线程用于测试非一致性内存访问NUMA系统的内存。
根据本公开的第三方面,提供一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现上述的内存测试方法。
根据本公开的第四方面,提供一种电子设备,包括:处理器;以及存储器,用于存储所述处理器的可执行指令;其中,所述处理器配置为经由执行所述可执行指令来执行上述的内存测试方法。
附图说明
图1示意性示出了根据本公开的示例性实施例的一种一致性内存访问和非一致性内存访问的结构对比示意图;
图2示意性示出了根据本公开的示例性实施例的一种非一致性内存访问的节点示意图;
图3示意性示出了根据本公开的示例性实施例的一种非一致性内存访问中访问不同内存的速度对比示意图;
图4示意性示出了根据本公开的示例性实施例的一种内存测试方法的步骤流程图;
图5示意性示出了根据本公开的示例性实施例的一种内存测试方法中平均分配本地内存的步骤流程图;
图6示意性示出了根据本公开的示例性实施例的内存测试装置的方框图;
图7示意性示出了根据本公开的示例性实施例的一种电子设备的模块示意图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中,提供许多具体细节从而给出对本公开的实施方式的充分理解。然而,本领域技术人员将意识到,可以实践本公开的技术方案而省略所述特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知技术方案以避免喧宾夺主而使得本公开的各方面变得模糊。
此外,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
附图中所示的流程图仅是示例性说明,不是必须包括所有的步骤。例如,有的步骤还可以分解,而有的步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。另外,下面所有的术语“第一”、“第二”、“第三”仅是为了区分的目的,不应作为本公开内容的限制。
一致性内存访问(Uniform Memory Access,UMA)的特点是多个处理器通过同一条总线来访问系统中所有可用的内存,如图1中的UMA所示。每个处理器访问内存的时间是一样的,所以称为一致性内存访问。
UMA存在的问题是多个处理器通过一条总线访问内存,使共享总线上负载增加。多个处理器会争用内存控制器(memory controller)造成内存访问冲突。另外,由于总线带宽有限,会造成访问延迟。
在UMA架构下,CPU和内存控制器之间的前端总线在系统CPU数量不断增加的前提下,成为了系统性能的瓶颈。因此,在引入64位x86架构时,实现了非一致性内存访问NUMA架构。
与UMA相对的,NUMA的特点是每个处理器都有一个本地内存如图1中的NUMA所示。并且每个处理器可以访问其他处理器的本地内存(相当于远端内存)。
对于NUMA而言,也可以将处理器和对应的本地内存划分为不同的组,一组处理器可以一起访问它们自己的本地内存(相当于内存组)。当存在多组处理器和它们的内存组时,每组处理器和其对应的内存组就构成一个NUMA节点(node),如图2所示。
需要说明的是,无论是一致性内存访问UMA,还是非一致性内存访问NUMA,都属于对称多处理器架构(Symmetric Multiprocessing,SMP)。SMP属于目前最常见的多处理器计算机架构,SMP的多个处理器都是同构的,使用相同架构的CPU。
通常,NUMA在本地内存不够用时,可以访问远端内存,由于处理器访问本地内存时速度要比访问远端内存时快,如图3所示,处理器CPU0访问本地内存(内存0)的速度,要比访问处理器CPU1的本地内存(内存1)的速度快,也要比访问处理器CPU2的本地内存(内存2)的速度快,同样,也比访问处理器CPU3的本地内存(内存3)的速度快,其中,内存1、内存2和内存3对于处理器CPU0而言属于远端内存。由于处理器访问远端内存的速度较慢,导致处理器访问远端内存时会产生延迟,访问的效率明显降低。
通常,NUMA系统的内存测试过程中是通过一个内存测试线程来遍历测试所有的内存,导致存储器测试线程始终绑定在一个处理器上,当测试远端内存时,访问和存取速度会大幅降低,严重影响测试效率。
基于此,本公开示例性实施方式提供了一种基于NUMA的内存测试方法,用于测试非一致性内存访问NUMA的内存。参照图4,示出了本公开实施例的一种内存测试方法的步骤流程图。在一种可行的实施方式中,上述内存测试方法可以包括:
步骤S410、确定各处理器的本地内存;
步骤S420、为处理器的各执行线程平均分配本地内存;
步骤S430、利用各执行线程并行地测试分配到的本地内存。
本公开实施例提供的内存测试方法,一方面,通过确定处理器的本地内存,利用处理器的各执行线程测试分配到的本地内存,可以避免处理器去测试访问远端内存,从而可以提高内存测试的效率;另一方面,通过为处理器的各执行线程平均分配本地内存,并且利用各执行线程并行地测试平均分配到的本地内存,可以使每个执行线程的测试时间相同,从而可以缩短总的测试时间,进一步提高内存测试的效率。
下面将结合具体实施方式对内存测试方法进行详细说明:
在步骤S410中,确定各处理器的本地内存。
非一致性内存访问NUMA架构是指多处理器系统中,内存的访问时间是依赖于处理器和内存之间的相对位置的。这种架构里存在和处理器相对近的内存,通常被称作本地内存;还有和处理器相对远的内存,通常被称为远端内存。
在Intel x86平台上,所谓本地内存,就是CPU可以经过Uncore(非运算核心)部件里的iMC(Integrated Memory Controller,整合内存控制器)访问到的内存。而那些非本地的,远端内存(Remote Memory),则需要经过QPI(QuickPath Interconnect)控制器的链 路到该内存所在的本地CPU的iMC来访问。曾经在Intel IvyBridge的NUMA平台上做的内存访问性能测试显示,远端内存访问的延时是本地内存的一倍。因此,确定各处理器的本地内存对于提高内存的测试速率具有很明显的效果。
本公开示例性实施方式中,在确定各处理器的本地内存时,是以确定各处理器的物理地址分界点为基础的。其中,物理地址分界点是内存物理地址中两个属于不同处理器的相邻的物理地址。通过物理地址分界点可以将内存物理地址划分为内存物理地址块,不同的内存物理地址块属于不同的处理器。
需要说明的是,内存物理地址也就是内存单元的物理地址是由其所处的地址总线上的位置决定的,机器安装完成后,其物理地址是固定的、不变的、并不是由处理器CPU所分配的。物理地址指的是加载到内存地址寄存器中的地址,是内存单元的真正地址。在前端总线上传输的内存地址都是内存物理地址,编号从0开始一直到可用物理内存的最高端。这些数字被北桥(Nortbridge Chip)映射到实际的内存条上。
在实际应用中,确定物理地址分界点的方式可以有多种,本公开示例性实施方式中,在确定各处理器的物理地址分界点的过程中,需要利用地址解码器Decoder对地址信号进行解码,以获得内存物理地址对应的物理地址分界点。
在实际应用中,地址信号可以通过逻辑分析仪来抓取获得,其中,逻辑分析仪是分析数字系统逻辑关系的仪器。逻辑分析仪是属于数据域测试仪器中的一种总线分析仪,即以总线(多线)概念为基础,同时对多条数据线上的数据流进行观察和测试的仪器。通常,逻辑分析仪所抓取到的地址信号可以通过地址解码器进行解码,进而把数据解析出来,以获得内存物理地址,如表1所示。
表1
Figure PCTCN2022104440-appb-000001
Figure PCTCN2022104440-appb-000002
其中,SK为Socket,一个Socket对应一个物理CPU。
从表1可以看出,在物理内存地址0x0000008FFFFFFFFF及其之前的地址均属于CPU00,在物理内存地址0x0000008FFFFFFFFF之后的地址均属于CPU01,因此,可以将物理内存地址0x0000008FFFFFFFFF确定为一个物理地址分界点,在该物理地址分界点及其之前的地址所对应的内存属于处理器CPU00的本地内存,该物理地址分界点之后的地址所对应的内存属于处理器CPU01的本地内存。
另外,从表1还可以看出,在物理内存地址0x0000009000000000之前的地址均属于CPU00,在物理内存地址0x0000009000000000及其之后的地址均属于CPU01,因此,可以将物理内存地址0x0000009000000000确定为另一个物理地址分界点,在该物理地址分界点之前的地址所对应的内存属于处理器CPU00的本地内存,该物理地址分界点及其之后的地址所对应的内存属于处理器CPU01的本地内存。并且,从表1可以看出,物理内存地址0x0000008FFFFFFFFF和0x0000009000000000属于相邻的物理地址。
本公开示例性实施方式中,通过确定各处理器的物理地址分界点,可以基于该物理地址分界点确定出每个处理器所对应的物理地址,从而可以确定出属于各处理器的对应本地内存。由于物理地址分界点属于一种内存地址切分点,相当于把内存划分为不同的内存块,再判断每个内存块属于哪一个处理器,因此与内存节点无关,甚至是跨内存节点的。如表1所示,上述的物理内存地址均属于同一个内存条DIMM00,但其中的地址却被划分为不同的处理器的本地内存。其中,DIMM全称Dual-Inline-Memory-Modules,中文名叫双列直插式存储模块。
需要说明的是,内存块相对于处理器而言属于一块连续的内存地址块,内存节点是计算机内存拓扑结构上的一组内存条。计算机做内存映射的时候,不会区分内存节点,只是按照内存地址来做编址,所以在编址的过程中,一定会跨越内存节点。本公开示例性实施方式提供的内存测试方法,通过确定物理地址分界点,把内存地址划分为不同的内存块,从而无需关注内存节点额外的操作,不会受到内存节点的限制。
在实际应用中,地址信号除过通过逻辑分析仪来抓取之外,还可以通过DDR(Double Data Rate,双倍速率同步动态随机存储器)内存协议分析仪等仪器来记录抓取,本公开示例性实施方式对于具体的抓取设备不作特殊限定。
在步骤S420中,为处理器的各执行线程平均分配本地内存。
本公开示例性实施方式中,执行内存测试的任务就叫做执行线程,每个执行线程是由处理器中的内核来完成的。对于一个处理器而言通常包含有多个内核,例如,包含36个内核,每个内核用于执行内存测试的任务。其中,内核是处理器的核心,一个处理器可以有多个内核(也就是多核处理器),而一个内核只能属于一个处理器。CPU内核是CPU中间的核心芯片,由单晶硅制成,用来完成所有的计算、接受/存储命令、处理数据等,是数字处理核心。核心(Die)又称为内核,是CPU最重要的组成部分。CPU中心那块隆起的芯片就是核心,是由单晶硅以一定的生产工艺制造出来的,CPU所有的计算、接受/存储命令、处理数据都由核心执行。
对于本公开实施例提供的执行线程而言,其数量与处理器的内核的数量相等。一个处理器内核对应一个执行线程,每个执行线程按照处理器内核来启动,并由该内核来执行完成。
通常,一个处理器对应有多段本地内存,如表1所示,处理器CPU00对应多段本地内存,处理器CPU01对应多段本地内存。因此,在确定出各处理器的本地内存后,可以根据需要为处理器的内核来分配本地内存,也就是为每个执行线程来分配本地内存。
本公开示例性实施方式中,可以为处理器的每个执行线程平均分配本地内存,使得每 个执行线程所测试的本地内存大小一样,从而可以确保每个执行线程执行内存测试的时间相同,也就是处理器的每个内核所要测试的本地内存大小相同。此过程相当于将内存测试时间平均化到每个处理器的内核中,进而缩短了内存测试的总用时,也就提高了内存测试的效率。又由于在内存测试过程中,每个处理器所测试的都是本地内存,而非远端内存,从而避免了测试远端内存时存在的延时问题,也就进一步提高了内存测试的效率。
在实际应用中,为处理器的各执行线程平均分配本地内存的方式可以有多种,本公开示例性实施方式中,参照图5,为处理器的各执行线程平均分配本地内存的步骤包括:
步骤S510、确定处理器对应的本地内存的总存储量;
步骤S520、将总存储量除以处理器的执行线程的数量,确定出各执行线程分配的本地内存的平均存储量;
步骤S530、根据平均存储量,确定各执行线程分配的本地内存。
本公开示例性实施方式提供的内存测试方法中,采用为处理器的各执行线程平均分配本地内存的方式,并且在平均分配本地内存的过程中,是通过确定处理器对应的本地内存的总存储量,再结合处理器的执行线程的数量,来确定出分配给各执行线程的平均存储量,从而基于平均存储量来为各执行线程分配本地内存,从而实现一种平均分配本地内存的方法。
下面列举实施例对平均分配本地内存的各步骤进行详细说明:
具体的,在步骤S510中,确定处理器对应的本地内存的总存储量。
本公开示例性实施方式中,对于每个处理器而言,可以先确定出该处理器对应的每个本地内存的存储量(每个本地内存的存储量可能是不同的),然后将该处理器对应的所有的本地内存的存储量相加,就可以获得该处理器对应的本地内存的总存储量。
以表1中所列举的处理器CPU00和CPU01为例,其中,CPU00对应的本地内存包括:从0x0000007000000000到0x0000008FFFFFFFFF物理内存地址对应的内存;而CPU01对应的本地内存包括:从0x0000009000000000到0x00000b0000000000物理内存地址对应的内存。
对于处理器CPU00而言,其包含的本地内存中,可能每段本地内存的存储量都不同,可以将所有本地内存的各存储量相加,获得CPU00对应的本地内存的总存储量,例如为18GB。
对于处理器CPU01而言,其包含的本地内存中,可能每段本地内存的存储量也都不同,可以将所有段本地内存的各存储量相加,获得CPU01对应的本地内存的总存储量,例如为10GB。
通过上述方式即可获得处理器对应的本地内存的总存储量。
接着,在步骤S520中,将总存储量除以处理器的执行线程的数量,确定出各执行线程分配的本地内存的平均存储量。
在获得处理器对应的本地内存的总存储量之后,可以基于该总存储量来为每个执行线程分配本地内存。如前所述,为处理器的各执行线程平均分配本地内存可以缩短内存测试的时间,提高内存测试的效率。
因此,本公开示例性实施方式中,可以根据执行线程的数量来平均分配本地内存,以使每个执行线程分配到的本地内存大小相同。也就是说,可以通过将处理器对应的本地内存的总存储量除以处理器执行线程的数量的方式来平均分配本地内存。相当于确定出每个执行线程可以分配到的本地内存的平均存储量,可以基于该平均存储量再分配具体的本地内存。
由前述内容可知,处理器的执行线程的数量就是处理器的内核的数量,将本地内存的总存储量除以处理器的执行线程的数量就相当于将本地内存的总存储量除以处理器的内核的数量,相当于为存储器的每个内核确定出平均存储量,并平均分配本地内存。通常, 不同的处理器所具有的内核数量可能不同。
继续以上述的处理器CPU00和CPU01为例。对于处理器CPU00而言,假设其包含有36个内核,那么,CPU00对应的执行线程的数量就为36。因此,可以将CPU00对应的本地内存的总存储量18GB除以执行线程的数量36,确定出CPU00各执行线程需要分配的本地内存的平均存储量为:18GB/36=0.5GB。
对于处理器CPU01而言,假设其包含有18个内核,那么,CPU01对应的执行线程的数量为18。因此,可以将CPU01对应的本地内存的总存储量10GB除以执行线程的数量18,以确定出CPU01各执行线程需要分配的本地内存的平均存储量为:10GB/18=0.556GB。
需要说明的是,上述的平均存储量是每个执行线程,也就是处理器的每个内核需要测试的本地内存的大小,并不是指具体的哪个本地内存。
在步骤S530中,根据平均存储量,确定各执行线程分配的本地内存。
本公开示例性实施方式中,在确定出每个处理器中各执行线程分配的本地内存的平均存储量后,就可以基于该平均存储量,为各执行线程平均分配本地内存了。
例如,对于处理器CPU00,可以为每个执行线程分配0.5GB大小的本地内存;对于处理器CPU01,可以为每个执行线程分配0.556GB大小的本地内存。
在实际应用中,有可能每个本地内存的大小不同,可以根据平均存储量的大小为各执行线程分配不同大小的本地内存。例如,可以为处理器CPU00的某一执行线程分配两个0.2GB大小的该处理器对应的本地内存和一个0.1GB大小的该处理器对应的本地内存。作为示例,可以为处理器CPU01的某一执行线程分配两个0.25GB大小的该处理器对应的本地内存,再分配一个较小的0.056GB大小的本地内存。
在实际分配过程中,如果没有合适大小的本地内存,例如,没有0.056GB大小的本地内存,可以对某一个本地内存进行分割,分配出0.056GB的部分后,将其余部分分配给其他执行线程。本公开示例性实施方式对于具体的分配方式不作特殊限定。
需要说明的是,上述的CPU00和CPU01只是一种示例性说明,本公开示例性实施方式对于处理器的执行线程的数量以及所对应的本地内存的存储量等不作特殊限定。
在步骤S430中,利用各执行线程并行地测试分配到的本地内存。
本公开示例性实施方式中,在为处理器的各执行线程分配完本地内存后,就可以采用各执行线程对分配的本地内存进行测试了。具体在测试过程中,各执行线程是并行进行测试的,从而可以缩短整体测试的时间,提高测试的效率。
在实际应用中,对内存进行测试的种类可以有多种,例如,对内存进行修改内存参数测试,修改系统参数测试,泄露测试、高温测试、低温测试、常温测试、高压测试,低压测试,使用不同的测试算法等测试方法进行组合测试。本公开示例性实施方式中,以读写测试为例对上述执行线程的测试过程进行说明。
本公开示例性实施方式中,可以利用各执行线程对分配到的本地内存进行读写验证,也就是说,通过执行线程在对应的本地内存中写入测试数据,然后在预设时间之后将写入的测试数据读取出来。具体的预设时间可以根据实际情况来确定,本公开示例性实施方式对此不作特殊限定。
在读取出测试数据后,可以将读取出的数据与写入到本地内存的测试数据进行比较,只有在读取出的数据与写入到本地内存的测试数据一致时,说明读写验证结果一致,读写测试通过。对于多个执行线程而言,只有当全部的读写验证的结果均一致时,说明测试通过,否则,只要有一个读写验证结果不一致,就说明测试没通过,可以报告错误,在测试完成的时候提示测试失败。
本公开示例性实施方式对于其他的内存测试种类不作具体说明,可以参照现有的测试手段即可,此处不再赘述。
进一步的,本公开示例性实施方式中,处理器在执行测试过程中,每个处理器的所有 执行线程共用一套测试代码,也就是说,代码区域是由多个执行线程共用的,每个执行线程运行代码时会有单独的内存堆栈区来记录各自运行到的代码的程序位置。相对于每个执行线程都有一套测试代码,每份测试代码都会占用一部分内存空间而言,本公开示例性实施方式提供的内存测试方法,通过所有执行线程共用一套测试代码,可以减少测试代码占用内存空间的大小,从而可以增大可测试内存的大小。
本公开示例性实施方式提供的内存测试方法,相当于为每个执行线程,将处理器的本地内存划分为不同的内存块。在根据平均存储量,为各执行线程分配好本地内存后,在各执行线程执行测试程序的时候,只需要输入内存块的起始和结束地址即可,测试程序在获取到起始和结束地址后,就开始对相应的内存进行测试。
需要说明的是,上述处理器的各执行线程主要是用于测试非一致性内存访问NUMA系统的内存。
综上所述,本公开实施例提供的内存测试方法,一方面,通过确定物理地址分界点来划分各处理器的本地内存,把内存地址划分为不同的内存块,不会区分内存节点,从而无需关注内存节点额外的操作,不会受到内存节点的限制。另一方面,通过确定处理器的本地内存,利用处理器的各执行线程测试分配到的本地内存,可以避免处理器去测试访问远端内存,从而可以提高内存测试的效率;再一方面,通过为处理器的各执行线程平均分配本地内存,并且利用各执行线程并行地测试平均分配到的本地内存,相当于为处理器的各内核平均分配本地内存,并通过各内核并行的测试所平均分配到的本地内存,可以使内存测试的时间平均化,每个执行线程的测试时间相同,从而可以缩短总的测试时间,进一步提高内存测试的效率。
需要说明的是,尽管在附图中以特定顺序描述了本发明中方法的各个步骤,但是,这并非要求或者暗示必须按照该特定顺序来执行这些步骤,或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的,可以省略某些步骤,将多个步骤合并为一个步骤执行,以及/或者将一个步骤分解为多个步骤执行等。
此外,在本示例实施例中,还提供了一种内存测试装置。参照图6,该内存测试装置600可以包括:本地内存确定模块610、内存分配模块620和测试模块630,其中:
本地内存确定模块610,可以用于确定各处理器的本地内存;
内存分配模块620,可以用于为处理器的各执行线程平均分配本地内存;
测试模块630,可以用于利用各执行线程并行地测试分配到的本地内存。
在本公开的一种示例性实施方式中,本地内存确定模块610,可以用于确定各处理器的物理地址分界点;根据物理地址分界点,确定各处理器的本地内存。
在本公开的一种示例性实施方式中,本地内存确定模块610,可以用于利用地址解码器对地址信号解码,获得内存物理地址对应的物理地址分界点。
在本公开的一种示例性实施方式中,内存分配模块620,可以用于确定处理器对应的本地内存的总存储量;将总存储量除以处理器的执行线程的数量,确定出各执行线程分配的本地内存的平均存储量;根据平均存储量,确定各执行线程分配的本地内存。
在本公开的一种示例性实施方式中,执行线程的数量等于处理器的内核的数量。
在本公开的一种示例性实施方式中,测试模块630,可以用于利用各执行线程对分配到的本地内存进行读写验证;当全部的读写验证的结果均一致时,则测试通过;否则报告错误,测试完成提示测试失败。
在本公开的一种示例性实施方式中,执行线程用于测试非一致性内存访问NUMA系统的内存。
上述中各内存测试装置的虚拟模块的具体细节已经在对应的内存测试方法中进行了详细的描述,因此,此处不再赘述。
应当注意,尽管在上文详细描述中提及了内存测试装置的若干模块或者单元,但是这 种划分并非强制性的。实际上,根据本公开的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
在本公开的示例性实施例中,还提供了一种能够实现上述方法的电子设备。
所属技术领域的技术人员能够理解,本发明的各个方面可以实现为系统、方法或程序产品。因此,本发明的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。
下面参照图7来描述根据本发明的这种实施方式的电子设备700。图7显示的电子设备700仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。
如图7所示,电子设备700以通用计算设备的形式表现。电子设备700的组件可以包括但不限于:上述至少一个处理单元710、上述至少一个存储单元720、连接不同系统组件(包括存储单元720和处理单元710)的总线730、显示单元740。
其中,所述存储单元720存储有程序代码,所述程序代码可以被所述处理单元710执行,使得所述处理单元710执行本说明书上述“示例性方法”部分中描述的根据本发明各种示例性实施方式的步骤。例如,所述处理单元710可以执行如图2中所示的步骤S410、确定各处理器的本地内存;步骤S420、为处理器的各执行线程平均分配本地内存;步骤S430、利用各执行线程并行地测试分配到的本地内存。
存储单元720可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)7201和/或高速缓存存储单元7202,还可以进一步包括只读存储单元(ROM)7203。
存储单元720还可以包括具有一组(至少一个)程序模块7205的程序/实用工具7204,这样的程序模块7205包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。
总线730可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。
电子设备700也可以与一个或多个外部设备770(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该电子设备700交互的设备通信,和/或与使得该电子设备700能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口750进行。并且,电子设备700还可以通过网络适配器760与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器760通过总线730与电子设备700的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备700使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本公开实施方式的方法。
在本公开的示例性实施例中,还提供了一种计算机可读存储介质,其上存储有能够实现本说明书上述方法的程序产品。在一些可能的实施方式中,本发明的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在终端设备上运行时,所述程序代码用于使所述终端设备执行本说明书上述“示例性方法”部分中描述的根据本发明 各种示例性实施方式的步骤。
根据本发明的实施方式的用于实现上述方法的程序产品,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本发明的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言的任意组合来编写用于执行本发明操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。
此外,上述附图仅是根据本发明示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其他实施例。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限定。

Claims (16)

  1. 一种内存测试方法,所述方法包括:
    确定各处理器的本地内存;
    为所述处理器的各执行线程平均分配所述本地内存;
    利用各所述执行线程并行地测试分配到的所述本地内存。
  2. 根据权利要求1所述的方法,其中,所述确定各处理器的本地内存,包括:
    确定各所述处理器的物理地址分界点;
    根据所述物理地址分界点,确定各所述处理器的本地内存。
  3. 根据权利要求2所述的方法,其中,所述确定各所述处理器的物理地址分界点,包括:
    利用地址解码器对地址信号解码,获得内存物理地址对应的所述物理地址分界点。
  4. 根据权利要求1-3中任一项所述的方法,其中,所述为所述处理器的各执行线程平均分配所述本地内存,包括:
    确定所述处理器对应的所述本地内存的总存储量;
    将所述总存储量除以所述处理器的所述执行线程的数量,确定出各所述执行线程分配的所述本地内存的平均存储量;
    根据所述平均存储量,确定各所述执行线程分配的所述本地内存。
  5. 根据权利要求4所述的方法,其中,所述执行线程的数量等于所述处理器的内核的数量。
  6. 根据权利要求1所述的方法,其中,所述利用各所述执行线程并行地测试分配到的所述本地内存,包括:
    利用各所述执行线程对分配到的所述本地内存进行读写验证;
    当全部的所述读写验证的结果均一致时,则测试通过;
    否则报告错误,测试完成提示测试失败。
  7. 根据权利要求1所述的方法,其中,所述执行线程用于测试非一致性内存访问NUMA系统的内存。
  8. 一种内存测试装置,所述装置包括:
    本地内存确定模块,用于确定各处理器的本地内存;
    内存分配模块,用于为所述处理器的各执行线程平均分配所述本地内存;
    测试模块,用于利用各所述执行线程并行地测试分配到的所述本地内存。
  9. 根据权利要求8所述的装置,其中,所述本地内存确定模块,用于确定各所述处理器的物理地址分界点;根据所述物理地址分界点,确定各所述处理器的本地内存。
  10. 根据权利要求9所述的装置,其中,所述本地内存确定模块,用于利用地址解码器对地址信号解码,获得内存物理地址对应的所述物理地址分界点。
  11. 根据权利要求8-10中任一项所述的装置,其中,所述内存分配模块,用于确定所述处理器对应的所述本地内存的总存储量;将所述总存储量除以所述处理器的所述执行线程的数量,确定出各所述执行线程分配的所述本地内存的平均存储量;根据所述平均存储量,确定各所述执行线程分配的所述本地内存。
  12. 根据权利要求11所述的装置,其中,所述执行线程的数量等于所述处理器的内核的数量。
  13. 根据权利要求8所述的装置,其中,所述测试模块,用于利用各所述执行线程对分配到的所述本地内存进行读写验证;当全部的所述读写验证的结果均一致时,则测试通过;否则报告错误,测试完成提示测试失败。
  14. 根据权利要求8所述的装置,其中,所述执行线程用于测试非一致性内存访问 NUMA系统的内存。
  15. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-7中任意一项所述的内存测试方法。
  16. 一种电子设备,包括:
    处理器;以及
    存储器,用于存储所述处理器的可执行指令;
    其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1-7中任意一项所述的内存测试方法。
PCT/CN2022/104440 2022-06-15 2022-07-07 内存测试方法及装置、存储介质及电子设备 WO2023240719A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210681561.7A CN117271230A (zh) 2022-06-15 2022-06-15 内存测试方法及装置、存储介质及电子设备
CN202210681561.7 2022-06-15

Publications (1)

Publication Number Publication Date
WO2023240719A1 true WO2023240719A1 (zh) 2023-12-21

Family

ID=89193034

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/104440 WO2023240719A1 (zh) 2022-06-15 2022-07-07 内存测试方法及装置、存储介质及电子设备

Country Status (2)

Country Link
CN (1) CN117271230A (zh)
WO (1) WO2023240719A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040068680A1 (en) * 2002-10-08 2004-04-08 Dell Products L.P. Method and apparatus for testing physical memory in an information handling system under conventional operating systems
CN104182334A (zh) * 2013-05-24 2014-12-03 鸿富锦精密工业(深圳)有限公司 Numa系统的内存测试方法及系统
CN109901957A (zh) * 2017-12-09 2019-06-18 英业达科技有限公司 以可扩展固件接口进行内存测试的计算装置及其方法
CN109901956A (zh) * 2017-12-08 2019-06-18 英业达科技有限公司 内存整体测试的系统及其方法
CN114528075A (zh) * 2021-12-28 2022-05-24 飞腾信息技术有限公司 Numa系统的性能调优方法、装置及计算机设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040068680A1 (en) * 2002-10-08 2004-04-08 Dell Products L.P. Method and apparatus for testing physical memory in an information handling system under conventional operating systems
CN104182334A (zh) * 2013-05-24 2014-12-03 鸿富锦精密工业(深圳)有限公司 Numa系统的内存测试方法及系统
CN109901956A (zh) * 2017-12-08 2019-06-18 英业达科技有限公司 内存整体测试的系统及其方法
CN109901957A (zh) * 2017-12-09 2019-06-18 英业达科技有限公司 以可扩展固件接口进行内存测试的计算装置及其方法
CN114528075A (zh) * 2021-12-28 2022-05-24 飞腾信息技术有限公司 Numa系统的性能调优方法、装置及计算机设备

Also Published As

Publication number Publication date
CN117271230A (zh) 2023-12-22

Similar Documents

Publication Publication Date Title
US8370533B2 (en) Executing flash storage access requests
US10795837B2 (en) Allocation of memory buffers in computing system with multiple memory channels
US20180074757A1 (en) Switch and memory device
JP4866646B2 (ja) メモリーに送るコマンドの選択方法、メモリーコントローラー、コンピュータシステム
US9575914B2 (en) Information processing apparatus and bus control method
US10540303B2 (en) Module based data transfer
BR112012010143B1 (pt) Unidade de tradução, sistema e método
US20220179792A1 (en) Memory management device
US9213656B2 (en) Flexible arbitration scheme for multi endpoint atomic accesses in multicore systems
CN114902198B (zh) 用于异构存储器系统的信令
US9305619B2 (en) Implementing simultaneous read and write operations utilizing dual port DRAM
US9229891B2 (en) Determining a direct memory access data transfer mode
US8359433B2 (en) Method and system of handling non-aligned memory accesses
WO2024093517A1 (zh) 内存管理方法及计算设备
CN114902187A (zh) 非易失性存储器模块的错误恢复
US20140115273A1 (en) Distributed data return buffer for coherence system with speculative address support
US10915467B2 (en) Scalable, parameterizable, and script-generatable buffer manager architecture
US9372796B2 (en) Optimum cache access scheme for multi endpoint atomic access in a multicore system
US20210149804A1 (en) Memory Interleaving Method and Apparatus
JP2023508117A (ja) 不揮発性メモリモジュールのエラー報告
WO2023240719A1 (zh) 内存测试方法及装置、存储介质及电子设备
US20230376427A1 (en) Memory system and computing system including the same
US20220222178A1 (en) Selective fill for logical control over hardware multilevel memory
WO2023241655A1 (zh) 数据处理方法、装置、电子设备以及计算机可读存储介质
TW202340931A (zh) 具有雜訊鄰居緩解及動態位址範圍分配的直接交換快取

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22946394

Country of ref document: EP

Kind code of ref document: A1